Loading...

Speech signal modeling using multivariate distributions

Aroudi, A ; Sharif University of Technology | 2015

1156 Viewed
  1. Type of Document: Article
  2. DOI: 10.1186/s13636-015-0078-1
  3. Publisher: Springer International Publishing , 2015
  4. Abstract:
  5. Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density function (pdf) for speech signals is Gaussian, recent studies have shown the superiority of super-Gaussian pdfs. A large research effort has focused on the investigation of a univariate case of speech signal distribution; however, in this paper, we study the multivariate distributions of speech signal and its representations using the conventional distribution functions, e.g., multivariate Gaussian and multivariate Laplace, and the copula-based multivariate distributions as candidates. The copula-based technique is a powerful method in modeling non-Gaussian multivariate distributions with non-linear inter-dimensional dependency. The level of similarity between the candidate pdfs and the real speech pdf in different domains is evaluated using the energy goodness-of-fit test. In our evaluations, the best-fitted distributions for speech signal vectors with different lengths in various domains are determined. A similar experiment is performed for different classes of English phonemes (fricatives, nasals, stops, vowels, and semivowel/glides). The evaluation results demonstrate that the multivariate distribution of speech signals in different domains is mostly super-Gaussian, except for Mel-frequency cepstral coefficient. Also, the results confirm that the distribution of the different phoneme classes is better statistically modeled by a mixture of Gaussian and Laplace pdfs. The copula-based distributions provide better statistical modeling of vectors representing discrete Fourier transform (DFT) amplitude of speech vectors with a length shorter than 500 ms
  6. Keywords:
  7. Copula-based multivariate distribution ; Discrete fourier transform (DFT) ; Linear predictive coefficient (LPC) ; Mel-frequency cepstral coefficient (MFCC) ; Multivariate distribution of speech signal ; Design for testability ; Discrete cosine transforms ; Discrete fourier transforms ; Distribution functions ; Function evaluation ; Gaussian distribution ; Laplace transforms ; Mathematical transformations ; Probability density function ; Speech ; Speech processing ; Speech recognition ; Discrete Cosine Transform(DCT) ; Goodness of fit (GOF) test ; Linear predictive coefficients ; Mel-frequency cepstral coefficients ; Multivariate distributions ; Speech signals ; Speech communication
  8. Source: Eurasip Journal on Audio, Speech, and Music Processing ; Volume 2015, Issue 1 , 2015 , Pages 1-14 ; 16874714 (ISSN)
  9. URL: http://asmp.eurasipjournals.springeropen.com/articles/10.1186/s13636-015-0078-1