Automatic Speech Recognition System for Pilot-Air Traffic Service Units Communications

Azadmanesh, Mahsa; Bahrani, Mohammad Baba Ali, Bagher Pazooki, Farshad

Please enable javascript in your browser.

Automatic Speech Recognition System for Pilot-Air Traffic Service Units Communications

Azadmanesh, Mahsa | 2017

809 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 50731 (31)
University: Sharif University of Technology
Department: Languages and Linguistics Center
Advisor(s): Bahrani, Mohammad; Baba Ali, Bagher; Pazooki, Farshad
Abstract:
Currently, in the Islamic Republic of Iran, after aviation accidents and incidents, conversations between pilots and air traffic controllers are re-examined by the State Air Transport Organization of the Islamic Republic of Iran and turned into text. The Automatic Recognition System for Pilot-Air Traffic Service Units’ Communication helps in the implementation of speech recognition. Reducing the time and cost of converting conversations into texts and creating an aviation database in the country are other uses of this system. In this research, after collecting and refining the actual conversation between pilots and air traffic controllers and examining seven methods, we design a system that converts audio data into text using the Kaldi tool and the feature extraction method of perceptual linear prediction. After tagging the data and constructing a lexicon of about 700 words, a tri-phone linguistic model is designed. In audio files, non-speech noises are deleted for more than half a second, and the data is divided into 30 seconds or smaller files. When storing files, they are changed from stereo to mono, and we save all the data at a sampling rate of 16 kHz and 16 bits. After refining a 280 MB file, it will remain equivalent to 54 minutes of conversation, which we use to design the acoustic model. In making the acoustic model, the Hidden Markov Model, the Subspace Gaussian Mixture Model and the Deep Neural Network are used. As highlighted in the results section, regarding the noise of data and access to a small volume of actual conversations the best result can be achieved from a Deep Neural Network with a precision of 33.96%
Keywords:
Automatic Speech Recognition ; Feature Extraction ; Aviation Conversations Recognition ; Perceptual Linear Prediction ; Noisy Environment ; Radio Communication Conversations

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code