Search for: speech-communication
Total 29 records
Article ISSPIT 2007 - 2007 IEEE International Symposium on Signal Processing and Information Technology, Cairo, 15 December 2007 through 18 December 2007 ; 2007 , Pages 285-290 ; 9781424418350 (ISBN) ; Lakdashti, A ; Manzuri Shalmani, M. T ; Sharif University of Technology
Communication of speech over error prone channels such as wireless channels and internet usually suffers from loss of large number of adjacent samples. In this paper, we propose to make artificial correlation between speech samples which distorts it. By choosing appropriate parameters, one can control this distortion to lie below acceptable ranges. Using this correlation, the receiver can recover lost samples up to a certain limit using our proposed algorithm. Experimental results show that our solution overcomes a previous one reported in the literature specially when the amount of lost samples are below the mentioned limit. ©2007 IEEE
Article 2007 IEEE International Conference on Communications, ICC'07, Glasgow, Scotland, 24 June 2007 through 28 June 2007 ; August , 2007 , Pages 1778-1783 ; 05361486 (ISSN); 1424403537 (ISBN); 9781424403530 (ISBN) ; Manzuri Shalmani, M. T ; Sharif University of Technology
In the internet telephony, loss of IP packets causes instantaneous discontinuities in the received speech. In this paper, we have focused on finding an error resilient method for this problem. Our proposed method creates artificial correlation between speech samples that pre-distorts the speech signal. The receiver uses this correlation to reconstruct the lost speech packets. An appropriate speech enhancement technique is designed for the reduction of the processing error in the recovered speech caused by the speech codecs. The SegSNR results show the superiority of our proposed speech enhancement method over a recently proposed one. © 2007 IEEE
Article 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, 6 September 2015 through 10 September 2015 ; Volume 2015-January , January , 2015 , Pages 2724-2728 ; 2308457X (ISSN) ; Sameti, H ; Liu, Y ; Sharif University of Technology
International Speech and Communication Association 2015
In this paper we investigate the role of discourse analysis in extractive meeting summarization task. Specifically our proposed method comprises of two distinct steps. First we use a meeting segmentation algorithm in order to detect various functional parts of the input meeting. Afterwards, a two level scoring mechanism in a graph-based framework is used to score each dialogue act in order to extract the most valuable ones and include them in the extracted summary. We evaluate our proposed method on AMI and ICSI corpora and compare it with other state-of-the-art graph based algorithms according to various evaluation metrics. The experimental results show that our algorithm outperforms the...
Article 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 May 2019 through 17 May 2019 ; Volume 2019-May , 2019 , Pages 7345-7349 ; 15206149 (ISSN); 9781479981311 (ISBN) ; Moosavi Dezfooli, S. M ; Baghshah, M. S ; Frossard, P ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2019
Despite the vast success neural networks have achieved in different application domains, they have been proven to be vulnerable to adversarial perturbations (small changes in the input), which lead them to produce the wrong output. In this paper, we propose a novel method, based on gradient projection, for generating universal adversarial perturbations for text; namely sequence of words that can be added to any input in order to fool the classifier with high probability. We observed that text classifiers are quite vulnerable to such perturbations: inserting even a single adversarial word to the beginning of every input sequence can drop the accuracy from 93% to 50%. © 2019 IEEE
Article 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, 4 May 2020 through 8 May 2020 ; Volume 2020 , May , 2020 , Pages 4032-4036 ; Babaie Zadeh, M ; Jutten, C ; The Institute of Electrical and Electronics Engineers, Signal Processing Society ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2020
This work aims to propose an approach for pruning a bagging ensemble regression (BER) model based on sparse representation, which we call sparse representation pruning (SRP). Firstly, a BER model with a specific number of subensembles should be trained. Then, the BER model is pruned by our sparse representation idea. For this type of regression problems, pruning means to remove the subensembles that do not have a significant effect on prediction of the output. The pruning problem is casted as a sparse representation problem, which will be solved by orthogonal matching pursuit (OMP) algorithm. Experiments show that the pruned BER with only 20% of the initial subensembles has a better...
Article SIGMAP 2007 - 2nd International Conference on Signal Processing and Multimedia Applications, Barcelona, 28 July 2007 through 31 July 2007 ; 2007 , Pages 287-292 ; 9789898111135 (ISBN) ; Ghaemmaghami, S ; Sharif University of Technology
One of the main issues with data hiding algorithms is capacity of data embedding. Most of data hiding methods suffer from low capacity that could make them inappropriate in certain hiding applications. This paper presents a high capacity data hiding method that uses encryption and the multi-band speech synthesis paradigm. In this method, an encrypted covert message is embedded in the unvoiced bands of the speech signal that leads to a high data hiding capacity of tens of kbps in a typical digital voice file transmission scheme. The proposed method yields a new standpoint in design of data hiding systems in the sense of three major, basically conflicting requirements in steganography, i.e....
Article Canadian Conference on Electrical and Computer Engineering ; 2014 ; Ghorshi, S ; Sharif University of Technology
Noise reduction of speech signals plays an important role in telecommunication systems. Various types of speech additive noise can be introduced such as babble, crowd, large city, and highway which are the main factor of degradation in perceived speech quality. There are some cases on the receiver side of telecommunication systems, where the direct value of interfering noise is not available and there is just access to noisy speech. In these cases the noise cannot be cancelled totally but it may be possible to reduce the noise in a sensible way by utilizing the statistics of the noise and speech signal. In this paper the proposed method for noise reduction is Bayesian recursive state-space...
Article ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 22 May 2011 through 27 May 2011, Prague ; 2011 , Pages 3672-3675 ; 15206149 (ISSN) ; 9781457705397 (ISBN) ; Malek-Mohammadi, M ; Babaie-Zadeh, M ; Jutten, C ; Sharif University of Technology
In this paper, we address the matrix completion problem and propose a novel algorithm based on a smoothed rank function (SRF) approximation. Among available algorithms like FPCA and OptSpace, there is no solution that can simultaneously cover wide range of easy and hard problems. This new algorithm provides accurate results in almost all scenarios with a reasonable run time. It especially has low execution time in hard problems where other methods need long time to converge. Furthermore, when the rank is known in advance and is high, our method is very faster than previous methods for the same accuracy. The main idea of the algorithm is based on a continuous and differentiable approximation...
Article 2015 28th IEEE Canadian Conference on Electrical and Computer Engineering, CCECE 2015, 3 May 2015 through 6 May 2015 ; Volume 2015-June, Issue June , June , 2015 , Pages 1248-1253 ; 08407789 (ISSN) ; Ghorshi, S ; Sarafnia, A ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2015
Reverberated speech signals in noisy acoustical environments cause some problems such as reducing speech intelligibility, distinguishing speakers, locating source, quality for hands-free telephony, hearing aid, etc. Adaptive filters can be applied to suppress the interfering signals and reduce the reverberation effects or to dereverberate the received speech signals at microphone. In this paper, Bayesian State-Space Kalman and Wiener filters have been employed to reduce the effect of noise on received speech signal and their results are compared. Also, a dereverberation method is proposed by applying an inverse filter to the received speech signals to reduce the effect of reverberation on...
Article IEEE Transactions on Multimedia ; Volume 18, Issue 12 , 2016 , Pages 2345-2357 ; 15209210 (ISSN) ; Perez Gonzalez, F ; Akhaee, M. A ; Behnia, F ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc
The swift growth of cellular mobile networks in recent years has made voice channels almost accessible everywhere. Besides, data hiding has recently attracted significant attention due to its ability to imperceptibly embed side information that can be used for signal enhancement, security improvement, and two-way authentication purposes. In this regard, we aim at proposing efficient schemes for hiding data in the widespread voice channel of cellular networks. To this aim, our first contribution is to model the channel accurately by considering a linear filter plus a nonlinear scaling function. This model is validated through experiments with true speech signals. Then we leverage on this...
Article 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 8 September 2016 through 16 September 2016 ; Volume 08-12-September-2016 , 2016 , Pages 440-444 ; 2308457X (ISSN) ; Sameti, H ; Burget, L ; Cěrnocký, J. H ; Maghsoodi, N ; Sharif University of Technology
International Speech and Communication Association 2016
Recently, a new data collection was initiated within the RedDots project in order to evaluate text-dependent and text-prompted speaker recognition technology on data from a wider speaker population and with more realistic noise, channel and phonetic variability. This paper analyses our systems built for RedDots challenge-the effort to collect and compare the initial results on this new evaluation data set obtained at different sites. We use our recently introduced HMM based i-vector approach, where, instead of the traditional GMM, a set of phone specific HMMs is used to collect the sufficient statistics for i-vector extraction. Our systems are trained in a completely phraseindependent way on...
Article 2017 12th International Conference on Sampling Theory and Applications, SampTA 2017, 3 July 2017 through 7 July 2017 ; 2017 , Pages 552-555 ; 9781538615652 (ISBN) ; Shahsavari, S ; Marvasti, F ; Sharif University of Technology
In this paper, we will provide a comparison between uniform and random sampling for speech and music signals. There are various sampling and recovery methods for audio signals. Here, we only investigate uniform and random schemes for sampling and basic low-pass filtering and iterative method with adaptive thresholding for recovery. The simulation results indicate that uniform sampling with cubic spline interpolation outperforms other sampling and recovery methods. © 2017 IEEE
Article 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017 ; Volume 2017-August , 2017 , Pages 518-522 ; 2308457X (ISSN) ; Povey, D ; Sameti, H ; Khudanpur, S ; Amazon Alexa; Apple; DiDi; et al.; Furhat Robotics; Microsoft ; Sharif University of Technology
International Speech Communication Association 2017
We describe our work on incorporating probabilities of phone durations, learned by a neural net, into an ASR system. Phone durations are incorporated via lattice rescoring. The input features are derived from the phone identities of a context window of phones, plus the durations of preceding phones within that window. Unlike some previous work, our network outputs the probability of different durations (in frames) directly, up to a fixed limit. We evaluate this method on several large vocabulary tasks, and while we consistently see improvements inWord Error Rates, the improvements are smaller when the lattices are generated with neural net based acoustic models. Copyright © 2017 ISCA
Article Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 1596-1600 ; 2308457X (ISSN) ; Hadian, H ; Lv, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology
International Speech Communication Association 2018
In recent years, different studies have proposed new methods for DNN-based feature extraction and joint acoustic model training and feature learning from raw waveform for large vocabulary speech recognition. However, conventional pre-processed methods such as MFCC and PLP are still preferred in the state-of-the-art speech recognition systems as they are perceived to be more robust. Besides, the raw waveform methods - most of which are based on the time-domain signal - do not significantly outperform the conventional methods. In this paper, we propose a frequency-domain feature-learning layer which can allow acoustic model training directly from the waveform. The main distinctions from...
Gray-scale image colorization using cycle-consistent generative adversarial networks with residual structure enhancer, Article 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, 4 May 2020 through 8 May 2020 ; Volume 2020 , May , 2020 , Pages 2223-2227 ; Behroozi, H ; The Institute of Electrical and Electronics Engineers, Signal Processing Society ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2020
The colorization of gray-scale images has always been a challenging task in computer vision. Recently, novel approaches have been introduced for unsupervised image translation between two domains using Generative Adversarial Networks (GANs). Since one can consider the gray-scale and colorful images as two separate domains, we propose a two-stage cycle-consistent network architecture to produce convincible images. First, an intermediate image is generated with a relatively uncomplicated objective function at the output. Next, at the second stage, the intermediate image is enhanced via a residual network structure with a more complicated objective function. Furthermore, by employing two...
Article 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, 4 May 2020 through 8 May 2020 ; Volume 2020-May , 2020 , Pages 3417-3421 ; Sadeghi, M ; Babaie Zadeh, M ; Jutten, C ; The Institute of Electrical and Electronics Engineers, Signal Processing Society ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2020
In dictionary learning, a desirable property for the dictionary is to be of low mutual and average coherences. Mutual coherence is defined as the maximum absolute correlation between distinct atoms of the dictionary, whereas the average coherence is a measure of the average correlations. In this paper, we consider a dictionary learning problem regularized with the average coherence and constrained by an upper-bound on the mutual coherence of the dictionary. Our main contribution is then to propose an algorithm for solving the resulting problem based on convexly approximating the cost function over the dictionary. Experimental results demonstrate that the proposed approach has higher...
Article 2008 IEEE Region 10 Conference, TENCON 2008, Hyderabad, 19 November 2008 through 21 November 2008 ; 2008 ; 1424424089 (ISBN); 9781424424085 (ISBN) ; Ghaemmaghami, S ; Sharif University of Technology
This paper addresses a new approach to data hiding that leads to a high data embedding rate of tens of kbps in a typical digital voice file transmission scheme. The purpose of the proposed method is restricted to offline voice transmission that uses stego speech files in wave format. The basic idea of the algorithm is to embed encrypted covert message in the unvoiced bands of spectrum of the cover speech. Inaudibility of the proposed hiding scheme is investigated through both support vector machines (SVM)-based steganalysis and the ITU-T P.862 PESQ standard speech quality assessment. The results assure imperceptibility and transparency of the stego speech
Article European Signal Processing Conference ; Volume. 97, Issue. 9 , 2014 , pp. 2510-2514 ; ISSN: 22195491 ; Kazemi, R ; Behnia, F ; Akhaee, M. A ; Sharif University of Technology
This paper considers the problem of covert communication through dedicated voice channels by embedding secure data in the cover speech signal utilizing spread spectrum additive data hiding. The cover speech signal is modeled by a Generalized Gaussian (GGD) random variable and the Maximum A Posteriori (MAP) detector for extraction of the covert message is designed and its reliable performance is verified both analytically and by simulations. The idea of adaptive estimation of detector parameters is proposed to improve detector performance and overcome voice non-stationarity. The detector's bit error rate (BER) is investigated for both blind and semi-blind cases in which the GGD shape...
Article Proceedings Elmar - International Symposium Electronics in Marine ; 2013 , Pages 207-210 ; 13342630 (ISSN); 9789537044145 (ISBN) ; Ghorshi, S ; Sarafnia, A ; Sharif University of Technology
Speech enhancement in real-time applications improves the quality and intelligibility of the speech and reduces communication fatigue. Nowadays, due to reactivity of the systems and spread of online real-time applications, including VoIP, state-space models have been used broadly. This paper presents a speech enhancement method based on adaptive Bayesian-Kalman filter and Bayesian-MAP estimation to improve the performance and the quality of the enhancement procedure. The enhancement method includes a combination of Bayesian-Kalman filter for noise reduction and Bayesian-MAP estimation for parameter estimation of the lost speech segments. Performance evaluation and result of the proposed...
Article Canadian Conference on Electrical and Computer Engineering, 8 May 2011 through 11 May 2011 ; May , 2011 , Pages 000292-000295 ; 08407789 (ISSN) ; 9781424497898 (ISBN) ; Ghorshi, S ; Mortazavi, M ; Pasha, S ; Sharif University of Technology
In real-time packet-based communication systems, one major problem is misrouted or delayed packets which results in degraded perceived voice quality. If packets are not available on time, the packet is known as lost packet. The easiest task of a network terminal receiver is to replace silence for the duration of lost speech segments. In a high quality communication system, to avoid quality reduction due to packet loss, a suitable method and/or algorithm is needed to replace the missing segments of speech. In this paper, we introduce a low order recursive linear prediction method for replacement of lost speech segment. In this method a normalized least mean square (NLMS) as an adaptive filter...