Distributed Data Mining in Peer-to-Peer Systems

Mashayekhi, Hoda | 2013

1023 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 44385 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Habibi, Jafar
  7. Abstract:
  8. Peer-to-peer (P2P) computing is a popular distributed computing paradigm for many applications which in-volve exchange of information among a large number of peers. In such applications, large amount of data is distributed among multiple dispersed sources. Therefore, data analysis is challenging due to processing, storage and transmission costs. Moreover, the data rarely remains static and frequent data changes, quickly out date previously extracted data mining models. Distributed data mining deals with the problem of data analysis in environments with distributed data and computing resources. In this dissertation, we explore distributed data mining in different structures of P2P systems. In structured P2P systems, L-overlay is proposed for indexing data, and processing complex queries in P2P systems. The overlay is later used for K-nearest neighbor and Naïve bayes classification.In unstructured P2P systems, gossiping proves to be an effective yet simple communication mean, which can also adapt to dynamics in the system. This communication paradigm enabled us to devise GoSCAN, a decentralized density-based clustering method which is adaptive to churn. The model is further extended to the novel decentralized algorithm GDCluster, which, to the best of our knowledge, is the first truly decentralized and adaptable clustering methodpplicable for different clustering algorithms. The proposed methods enjoy scalability and incremental adaptation in presence of dynamics. Analysis of the algorithms and extended simulation results, show the robustness, effectiveness and scalability of the proposed methods under static and dynamic settings, with different data assignment strategies. Also different state-of-the-art methods such as SSW and LSP2P are employed for comparison purposes
  9. Keywords:
  10. Data Mining ; Data Management ; Distributed Algorithm ; Distributed Clustering ; Dynamical Systems ; Peer-to-Peer Network ; Overlay Network

 Digital Object List


...see more