Search for: speech-signals
0.006 seconds
Total 28 records

    Steganography in silence intervals of speech

    , Article 2008 4th International Conference on Intelligent Information Hiding and Multiedia Signal Processing, IIH-MSP 2008, Harbin, 15 August 2008 through 17 August 2008 ; 2008 , Pages 605-607 ; 9780769532783 (ISBN) Shirali Shahreza, S ; Shirali Shahreza, M ; Sharif University of Technology
    This paper presents a new approach for hiding information in speech signals. In this method, the silence intervals of speech are found and the length (number of samples) of these intervals is changed to hide information. This method can be used simultaneously with other methods. © 2008 IEEE  

    Audio segmentation and classification based on a selective analysis scheme

    , Article Proceedings - 10th International Multimedia Modelling Conference, MMM 2004, Brisbana, 5 January 2004 through 7 January 2004 ; 2004 , Pages 42-48 ; 0769520847 (ISBN); 9780769520841 (ISBN) Ghaemmaghami, S ; Sharif University of Technology
    This paper addresses a new approach to segmentation and classification of audio through analysis of a smaller set of selective frames, which are identified by temporal decomposition (TD). These frames are located at the most steady instants, or event centroids, within a given block of the signal, which yield the maximal diversity over the set of selected features. Based on this selection scheme, the number of frames used in the analysis is reduced by at least 40%, while the temporal resolution is doubled as compared to that in typical audio classifiers. By constructing a classification system to segment audio into speech, music, speech-music, and others, it is shown that the proposed method... 

    Single-Channel Speech Dereverberation in Noisy Acoustical Environments

    , M.Sc. Thesis Sharif University of Technology Joorabchi, Marjan (Author) ; Ghorshi, Mohammad Ali (Supervisor)
    In speech processing, reflections of sound wave in a bounded space are considered as speech reverberation. Although for musical instruments and their related recording devices these reflections are useful, however, some other applications face serious problems receiving them along with speech signal. Reverberation causes speech degradation and intelligibility as well as highly quality reduction. Dereverberation algorithms are essential for Automatic Speech Recognition (ASR), telecommunication and hearing aid devices, which are some of the mostly used applications. While dereverberation itself is a challenging problem, dereverberate a speech signal recorded only by one microphone (channel)... 

    PIRS: Pseudo inversion based recovery of speech signals

    , Article ISSPIT 2007 - 2007 IEEE International Symposium on Signal Processing and Information Technology, Cairo, 15 December 2007 through 18 December 2007 ; 2007 , Pages 285-290 ; 9781424418350 (ISBN) Ajorloo, H ; Lakdashti, A ; Manzuri Shalmani, M. T ; Sharif University of Technology
    Communication of speech over error prone channels such as wireless channels and internet usually suffers from loss of large number of adjacent samples. In this paper, we propose to make artificial correlation between speech samples which distorts it. By choosing appropriate parameters, one can control this distortion to lie below acceptable ranges. Using this correlation, the receiver can recover lost samples up to a certain limit using our proposed algorithm. Experimental results show that our solution overcomes a previous one reported in the literature specially when the amount of lost samples are below the mentioned limit. ©2007 IEEE  

    LPRE: Lost speech packet recovery withenhancement

    , Article 2007 IEEE International Conference on Communications, ICC'07, Glasgow, Scotland, 24 June 2007 through 28 June 2007 ; August , 2007 , Pages 1778-1783 ; 05361486 (ISSN); 1424403537 (ISBN); 9781424403530 (ISBN) Ajorloo, H ; Manzuri Shalmani, M. T ; Sharif University of Technology
    In the internet telephony, loss of IP packets causes instantaneous discontinuities in the received speech. In this paper, we have focused on finding an error resilient method for this problem. Our proposed method creates artificial correlation between speech samples that pre-distorts the speech signal. The receiver uses this correlation to reconstruct the lost speech packets. An appropriate speech enhancement technique is designed for the reduction of the processing error in the recovered speech caused by the speech codecs. The SegSNR results show the superiority of our proposed speech enhancement method over a recently proposed one. © 2007 IEEE  

    A new method for separation of speech signals in convolutive mixtures

    , Article 13th European Signal Processing Conference, EUSIPCO 2005, Antalya, 4 September 2005 through 8 September 2005 ; 2005 , Pages 2210-2213 ; 1604238216 (ISBN); 9781604238211 (ISBN) Ferdosizadeh, M ; Babaie Zadeh, M ; Marvasti, F. A ; Sharif University of Technology
    In this paper, the performance of the gradient method based on Score Function Difference (SFD) in the separation of i.i.d. and periodic signals will be investigated. We will see that this algorithm will separate periodic signals better than i.i.d. ones. By using this experimental result and the fact that voiced frames of speech signals are approximately periodic, a modified algorithm named VDGaradient has been proposed for separation of speech signals in synthetic convolutive mixtures. In this method, voiced frames of speech signal will be used as the input to the gradient method, then the resulting separating system will be applied to separate sources completely  

    CIROLS: Codec independent recovery of lost speech packets

    , Article 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN) Ajorloo, H ; Manzuri Shalmani, M. T ; Aghatabar, M. M ; Sharif University of Technology
    In this paper, we have focused on finding an error resilient method for discontinuity-less transmission of speech signals in the internet. Our proposed method creates artificial correlation between speech samples that pre-distorts the speech signal. The receiver uses this correlation to reconstruct the lost speech packets. A discrete Fourier transform (DFT)-based speech enhancement technique is designed for the reduction of the processing error in the recovered speech caused by the speech codecs. The SegSNR results show the superiority of our proposed method over a recently proposed speech enhancement technique. ©2007 IEEE  

    Tamper Detection and Self Recovery of Speech Signals Based on Self Embedding

    , M.Sc. Thesis Sharif University of Technology Kasraei, Fatemeh (Author) ; Marvasti, Farrokh (Supervisor)
    There is a vast amount of publications in the literature on various types of tampering detection. In this thesis, we propose two methods for tampering detection and recovery of speech signals with WAV format. Due to redundancy of speech signals, the watermarking technique is utilized for this purpose. In embedder side,the watermark is generated using a similar version of the original signal. In first method, watermark is generated by the downsampled signal. Output of hash generator and some framing headers are used to detect the tampered region automatically. In second method, The original signal is compressed using a source encoder. The output is then packetized to detect the tampered... 

    Sparse Representation-based Classification and Application to Image and Speech Processing

    , M.Sc. Thesis Sharif University of Technology Nazari, Milad (Author) ; Babaeizadeh, Masoud (Supervisor)
    Pattern recognition is a branch of machine learning that is used to identify patterns and regularities in data. Many classifiers have been designed for this purpose that their number grows daily. A classification method which in recent years has attracted much attention is the Sparse Representation based-Classification (SRC). SRC which is a combined result of machine learning and compressed sensing, was used for face recognition and showed good classification performance on face image data. However, SRC could not well classify some other datasets. Some methods such as Block-SRC Weighted-SRC and ernel-SRC to classify different types of dataset are proposed that maintain highest accuracy on... 

    Parametric dictionary learning using steepest descent

    , Article ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 14 March 2010 through 19 March 2010 ; March , 2010 , Pages 1978-1981 ; 15206149 (ISSN) ; 9781424442966 (ISBN) Ataee, M ; Zayyani, H ; Babaie Zadeh, M ; Jutten, C ; Sharif University of Technology
    In this paper, we suggest to use a steepest descent algorithm for learning a parametric dictionary in which the structure or atom functions are known in advance. The structure of the atoms allows us to find a steepest descent direction of parameters instead of the steepest descent direction of the dictionary itself. We also use a thresholded version of Smoothed- ℓ0 (SL0) algorithm for sparse representation step in our proposed method. Our simulation results show that using atom structure similar to the Gabor functions and learning the parameters of these Gabor-like atoms yield better representations of our noisy speech signal than non parametric dictionary learning methods like K-SVD, in... 

    Real-time and MPEG-1 layer III compression resistant steganography in speech

    , Article IET Information Security ; Volume 4, Issue 1 , 2010 , Pages 1-7 ; 17518709 (ISSN) Shirali Shahreza, M. H ; Shirali Shahreza, S ; Sharif University of Technology
    Embedding a secret message into a cover media without attracting any attention, known as steganography, is one of the methods used for hidden communication purposes. One of the cover media that can be used for steganography is speech. In this study, the authors propose a new steganography method in speech signals. In this method, the silence intervals of speech are found and the length (number of samples) of these intervals is changed to hide information. The main feature of our method is robustness to MPEG-1 layer III (MP3) compression. This method can hide information in a speech stream with very low processing time which makes it a real-time steganography method. The hiding capacity of... 

    Robust multiplicative audio and speech watermarking using statistical modeling

    , Article 2009 IEEE International Conference on Communications, ICC 2009, Dresden, 14 June 2009 through 18 June 2009 ; 2009 ; 05361486 (ISSN); 9781424434350 (ISBN) Akhaee, M. A ; Khademi Kalantari, N ; Marvasti, F ; Sharif University of Technology
    In this paper, a semi-blind multiplicative watermarking approach for audio and speech signals has been presented. At the receiver end, the optimal Maximum Likelihood (ML) detector aided by the channel side information for Gaussian and Laplacian signals in noisy environment is designed and implemented. The performance of the proposed scheme is analytically calculated and verified by simulation. Then, we adapt the proposed scheme to speech and audio signals. To improve robustness, the algorithm is applied to low frequency components of the host signal. Besides, the power of the watermark is controlled elegantly to have inaudibility using Perceptual Evaluation of Audio Quality (PEAQ) and... 

    Training-Based Speech Enhancement Using Non-Gaussian Distributions

    , M.Sc. Thesis Sharif University of Technology Golrasan, Elham (Author) ; Sameti, Hossein (Supervisor)
    Statistical approaches (purely statistical and model-based) are the most efficient methods in single-channel speech enhancement. Despite these efficiencies, the problem of speech enhancement is still a challenge. Recent researches which propose univariate non-Gaussian distributions are more appropriate for speech signal in different domains. Based on these univariate distributions, statistical approaches have been modified and consequently better results have been reported. The purpose of this thesis is speech enhancement based on hidden Markov model using multivariate non-Gaussian distribution. The results of speech enhancement algorithm based on hidden Markov model in DCT and DFT domains... 

    Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge

    , Article Computer Speech and Language ; Volume 64 , 2020 Adiban, M ; Sameti, H ; Shehnepoor, S ; Sharif University of Technology
    Academic Press  2020
    Automatic Speaker Verification (ASV) is authentication of individuals by analyzing their speech signals. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or reconstruct the features. Attackers beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge of adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we use the Constant Q Cepstral Coefficient (CQCC)... 

    Adaptive sparse source separation with application to speech signals

    , Article 2007 IEEE International Conference on Signal Processing and Communications, ICSPC 2007, Dubai, 24 November 2007 through 27 November 2007 ; 2007 , Pages 640-643 ; 9781424412365 (ISBN) Azizi, E ; Mohimani, G. H ; Babaie Zadeh, M ; Sharif University of Technology
    In this paper, a sparse component analysis algorithm is presented for the case in which the number of sources is less than or equal to the number of sensors, but the channel (mixing matrix) is time-varying. The method is based on a smoothed l0 norm for the sparsity criteria, and takes advantage of the idea that sparsity of the sources is decreased when they are mixed. The method is able to separate synthetic and speech data, which require very weak sparsity restrictions. It can separate up to 50 mixed signals while being adaptive to channel variation and robust against noise. © 2007 IEEE  

    Separation of speech sources in under-determined case using SCA and time-frequency methods

    , Article 2008 International Symposium on Telecommunications, IST 2008, Tehran, 27 August 2008 through 28 August 2008 ; 2008 , Pages 533-538 ; 9781424427512 (ISBN) Mahdian, R ; Babaiezadeh, M ; Jutten, C ; Sharif University of Technology
    This paper presents a new algorithm for Blind Source Separation (BSS) of Instantaneous speech mixtures in under-determined case. A demixing algorithm which exploits the sparsity of speech signals in the short time Fourier transform (STFT) domain is proposed. This algorithm combines the modified k-means clustering procedure involved in the Line Orientation Separation Technique (LOST) with Smoothed l0-norm minimization (SL0) method. First procedure along with a transformation into a sparse domain tries to estimate the mixing matrix, and the second method tries to extract the sources from the mixtures. Simulation results are presented and compared to the Degenerate Unmixing Estimation Technique... 

    Speech signal modeling using multivariate distributions

    , Article Eurasip Journal on Audio, Speech, and Music Processing ; Volume 2015, Issue 1 , 2015 , Pages 1-14 ; 16874714 (ISSN) Aroudi, A ; Veisi, H ; Sameti, H ; Mafakheri, Z ; Sharif University of Technology
    Springer International Publishing  2015
    Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density function (pdf) for speech signals is Gaussian, recent studies have shown the superiority of super-Gaussian pdfs. A large research effort has focused on the investigation of a univariate case of speech signal distribution; however, in this paper, we study the multivariate distributions of speech signal and its representations using the conventional distribution functions, e.g., multivariate Gaussian and multivariate Laplace, and the copula-based multivariate distributions as candidates.... 

    Pathology Analysis and Multi-Class Discrimination for Laryngeal Disorders

    , M.Sc. Thesis Sharif University of Technology Pakravan, Mansooreh (Author) ; Jahed, Mehran (Supervisor)
    Ability to speak lucidly plays a key role in social relations. Consequently the role of larynx is quite important and timely diagnosis of laryngeal diseases has proved to be crucial. Since conventional diagnostic methods of the larynx are usually expensive or bothersome, the aim of this project is to analyze and classify diseases of the larynx with the aid of signal processing which tend to be faster and easier to implement and quite economical. This study utilizes the vowel sound /a/ and a well referenced database, namely MEEI (Massachusetts Eye and Ear Infirmary) which includes 53 normal and 213 abnormal voices in 7 classified diseases. In this work, using existing signal modeling and... 

    Kalman filter based packet loss replacement in presence of additive noise

    , Article 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering: Vision for a Greener Future, CCECE 2012 ; 2012 ; 9781467314336 (ISBN) Miralavi, S. R ; Ghorshi, S ; Tahaei, A
    A major problem in real-time packet-based communication systems, is misrouted or delayed packets which results in degraded perceived voice quality. If packets are not available on time, the packets are considered as lost. The easiest solution in a network terminal receiver is to replace silence for the duration of lost speech segments. In a high quality communication system, to avoid degradation in speech quality due to packet loss, a suitable method or algorithm is needed to replace the missing segments of speech. In this paper, we introduce an adaptive filter for replacement of lost speech segment. In this method Kalman filter as a state-space based method will be used to predict the... 

    Reducing speech recognition costs: By compressing the input data

    , Article IS'2012 - 2012 6th IEEE International Conference Intelligent Systems, Proceedings ; 2012 , Pages 102-107 ; 9781467327824 (ISBN) Halavati, R ; Shouraki, S. B ; Sharif University of Technology
    One of the key constraints of using embedded speech recognition modules is the required computational power. To decrease this requirement, we propose an algorithm that clusters the speech signal before passing it to the recognition units. The algorithm is based on agglomerative clustering and produces a sequence of compressed frames, optimized for recognition. Our experimental results indicate that the proposed method presents a frame rate with average 40 frames per second on medium to large vocabulary isolated word recognition tasks without loss of recognition accuracy which result in up to 60% faster recognition in compare to usual 100 fps fixed frame rate sampling. This value is quite...