بازسازی چهره سخنگو برای ادای یک عبارت صحبت متفاوت بر اساس حالات طبیعی چهره
بازسازی چهره سخنگو برای ادای یک عبارت صحبت متفاوت بر اساس حالات طبیعی چهره

پیغان، محمد رضا

Speech-Driven Talking Face Synthesis based on True Articulatory Gestures

2021

  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 53853 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Ghaemmaghami, Shahrokh; Behroozi, Hamid
  7. Abstract:
  8. Talking face synthesis is a process in which is made using audio-visual data or its features. Because the face is the first output, face animation plays a crucial role in this process. A high-quality face, a balance between different facial regions, natural movements of facial organs, and the like are basic requirements to synthesize a relatively realistic talking face. There are a wide variety of applications for the photo-realistic talking face. For instance, as a teaching assistant, or reading emails and e-books are only two simple ones to mention. To reach a realistic talking face with mentioned necessary requirements, we set a goal to consider all face regions and their movements. To hit the goal, we used a 3D morphable model with which we could, by a small number of parameters, control most major aspects of the face. To find these parameters, we take advantage of deep neural networks. In this project, we use modified version of Residual network-50, which we only change dimension of the last layer. We propose a novel methodology in which we first predict 3D morphable model coefficients of different face images, extracted from a video, by feeding them to a modified ResNet-50, trained on MICC database, then alternate special part of them with fake parameters (target face expression, extracted from target images by the same process), and at the end, reconstruct fake face mask, and then fake images using the 3D morphable model. Due to having one network for both parts, and also using audio-visual data, the proposed method outperforms the similar state of the art work in terms of quality and run time
  9. Keywords:
  10. Audio Visual Database ; Neural Network ; Facial Activity Recognition ; Face Expression ; Talking Face ; Face Mask

