Mining distributed frequent itemsets using a gossip based protocol

Bagheri, M ; Sharif University of Technology | 2012

329 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/UIC-ATC.2012.136
  3. Publisher: 2012
  4. Abstract:
  5. Recently, there has been a growing attention in frequent itemset mining in distributed systems. In this paper, we present an algorithm to extract frequent itemsets from large distributed datasets. Our algorithm uses gossip as the communication mechanism and does not rely on any central node. In gossip based communication, nodes repeatedly select other random nodes in the system, and exchange information with them. Our algorithm proceeds in rounds and provides all nodes with the required support counts of itemsets, such that each node is able to extract the global frequent itemsets. For local iteration and generation of candidate itemsets, a trie data structure is used, which facilitates the process and reduces execution time. We further propose an improvement to our algorithm by grouping nodes and arranging them into a hierarchical structure. By performing aggregation tasks in groups, communication overhead is effectively reduced. We evaluate our proposal using simulation, and show advantages of our algorithms in reducing execution time and communication overhead, while preserving accuracy
  6. Keywords:
  7. Communication mechanisms ; Communication overheads ; Data sets ; Distributed systems ; Execution time ; Frequent itemset mining ; Global frequent itemsets ; Gossip ; Gossip-based protocol ; Hierarchical structures ; Item sets ; Support count ; Trie ; Algorithms ; Communication ; Cost reduction ; Data mining ; Data structures ; Ubiquitous computing ; Iterative methods
  8. Source: Proceedings - IEEE 9th International Conference on Ubiquitous Intelligence and Computing and IEEE 9th International Conference on Autonomic and Trusted Computing, UIC-ATC 2012 ; 2012 , Pages 780-785
  9. URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6332083