Loading...

Fundamental Bounds for Clustering of Bernoulli Mixture Models

Behjati, Amin | 2023

55 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56632 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Motahari, Abolfazl
  7. Abstract:
  8. A random vector with binary components that are independent of each other is referred to as a Bernoulli random vector. A Bernoulli Mixture Model (BMM) is a combination of a finite number of Bernoulli models, where each sample is generated randomly according to one of these models. The important challenge is to estimate the parameters of a Bernoulli Mixture Model or to cluster samples based on their source models. This problem has applications in bioinformatics, image recognition, text classification, social networks, and more. For example, in bioinformatics, it pertains to clustering ethnic groups based on genetic data. Many studies have introduced algorithms for solving this problem without considering its theoretical accuracy and from them, there was less attention on clustering explicitly. In this research, a method for clustering samples without error generated by mixture of two Bernoulli Model is presented, along with theoretical correctness bounds that have not been previously examined. The time complexity of this method is of the order of O(n log⁡n+ nd^2 + d^'3), where n is the number of samples, d is the sample space dimension, and d' is the number of informative dimensions (meaning the model parameters in these dimensions are at least λ apart from each other). If the number of samples is of the order Ω(log⁡d), and the number of informative dimensions is of the order Ω(log⁡n), the method clusters the samples without any error with high probability
  9. Keywords:
  10. Statistical Learning Method ; Parameter Estimation ; Clustering ; Bernoulli Mixture Models ; Bioinformatics ; Information Retrieval

 Digital Object List

 Bookmark

No TOC