Loading...

Private Inner product retrieval for distributed machine learning

Mousavi, M. H ; Sharif University of Technology | 2019

491 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/ISIT.2019.8849347
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2019
  4. Abstract:
  5. In this paper, we argue that in many basic algorithms for machine learning, including support vector machine (SVM) for classification, principal component analysis (PCA) for dimensionality reduction, and regression for dependency estimation, we need the inner products of the data samples, rather than the data samples themselves.Motivated by the above observation, we introduce the problem of private inner product retrieval for distributed machine learning, where we have a system including a database of some files, duplicated across some non-colluding servers. A user intends to retrieve a subset of specific size of the set of the inner product of every pair of data items in the database with minimum communication load, without revealing any information about the identity of the requested subset. For achievability, we use the algorithms for multi-message private information retrieval. For converse, we establish that as the length of the files becomes large, the set of all inner products converges to independent random variables with uniform distribution hence we find asymptotic capacity for this problem. We also derive the rate of this convergence. To prove that, we construct special dependencies among sequences of the sets of all inner products with different length, which forms a time-homogeneous irreducible Markov chain, without affecting the marginal distribution. We show that this Markov chain has a uniform distribution as its unique stationary distribution, with rate of convergence dominated by the second largest eigenvalue of the transition probability matrix. This allows us to develop a converse, which converges to a tight bound in some cases, as the size of the files becomes large. © 2019 IEEE
  6. Keywords:
  7. Classification (of information) ; Distributed database systems ; Eigenvalues and eigenfunctions ; Information theory ; Markov processes ; Principal component analysis ; Probability distributions ; Search engines ; Support vector machines ; Asymptotic capacities ; Dimensionality reduction ; Distributed machine learning ; Independent random variables ; Private information retrieval ; Second largest eigenvalue ; Stationary distribution ; Transition probability matrix ; Machine learning
  8. Source: 2019 IEEE International Symposium on Information Theory, ISIT 2019, 7 July 2019 through 12 July 2019 ; Volume 2019-July , 2019 , Pages 355-359 ; 21578095 (ISSN); 9781538692912 (ISBN)
  9. URL: https://ieeexplore.ieee.org/document/8849347