Loading...

An Infrastructure for Data Analysis Extraction in Distributed Systems

Ghashami, Mina | 2012

367 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 43279 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Habibi, Jafar; Mirian Hosseinabadi, Hassan
  7. Abstract:
  8. In distributed systems, a huge amount of data is dispersed among different nodes; centralization of this data is infeasible due to communication and storage costs. In addition, Databases with high dimensional data objects are becoming more prevalent is many areas. When the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. This sparsity is problematic from many aspects. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data however all objects appear to be sparse and dissimilar in many ways which prevents common data organization strategies from being efficient. In these situations, often it is beneficial to apply dimension reduction techniques as a data preprocessing step and map them to a lower dimensional representation. By working with this reduced representation, tasks such as classification often yield more accurate results, while computational and storage costs may also be significantly reduced. In this thesis, an infrastructure for extracting data analysis which are distributed among nodes of a system is proposed. This infrastructure is a distributed dimensionality reduction algorithm. This algorithm is based on data grouping mechanism, creating of hierarchical structure on each group and reduces their data objects. Experimental results demonstrate the efficiency of proposed algorithm in terms of preserving the local structure of the data set, and reducing the computing and storage costs. All experiments have been performed in Matlab environment.



  9. Keywords:
  10. Distributed System ; Clustering ; Distributed Dimension Reduction ; Locality Preserving ; Infrastructure

 Digital Object List

  • محتواي پايان نامه
  •   view

 Bookmark

No TOC