Using Audio Speech Recognition Techniques in Augmented Reality Environment

Mirzaei, Mohammad Reza; Ghorshi, Alireza Mortazavi, Mohammad

Please enable javascript in your browser.

Using Audio Speech Recognition Techniques in Augmented Reality Environment

Mirzaei, Mohammad Reza | 2012

553 Viewed

Type of Document: M.Sc. Thesis
Language: English
Document No: 42813 (52)
University: Sharif University of Technology, International Campus, Kish Island
Department: Science and Engineering
Advisor(s): Ghorshi, Alireza; Mortazavi, Mohammad
Abstract:
Recently, many studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) can help people with disabilities. In this thesis we examine the ability of combining AR and ASR technologies to implement a new system for helping deaf people. This system can instantly take a narrator's speech and convert it into a readable text and show it directly on AR display. Also, with this system, people do not need to learn sign-language to communicate with deaf people. To improve the accuracy of the system, we use Audio-Visual Speech Recognition (AVSR) as a backup for the ASR engine in noisy environments. AVSR is one of the advances in ASR technology that combines audio, video and facial expressions to capture a narrator's voice. In addition, we use the Text-to-Speech (TTS) system to make our system more usable for deaf people. Since most deaf people have speech and dialect problems, they can use TTS to talk with others by creating a spoken sound version of the text with a computer. The results of testing our system showed that its accuracy is over 85 percent on average, by using different ASR engines, such as Dragon Naturally Speaking, Dragon Dictate and Microsoft Speech Recognition, in different noisy environments. Also, if the AVSR engine has been used as a backup, the Word Error Rate (WER) will be lowered compared to ASR and VSR engines in noisy environments. The results of testing TTS engines clearly show that it is possible to use TTS in our system, in terms of voice quality and processing time. The result of the surveys, which was conducted among deaf and ordinary people, shows that more than 80 percent of deaf people on average are very interested in using our system as an assistant in portable devices for communication
Keywords:
Automatic Speech Recognition ; Image Processing ; Video Processing ; Communications ; Deaf People ; Augmented Reality

Digital Object List

محتواي پايان نامه
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code