Loading...

Speech-Driven Facial Reenactment

Jalalifar, Ali | 2018

636 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 51052 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Karbalaei Aghajan, Hamid
  7. Abstract:
  8. Creating talking heads from audio input is interesting from both scientific and practical viewpoints, e.g. constructing virtual computer generated characters, aiding hearing-impaired people, live dubbing of videos with translated audio, etc. Due to its wide variety of applications, audio to video has been the focus of intensive research in recent years. Mapping audio to facial images with accurate lip-sync is an extremely difficult task because it is a mapping form 2-Dimensional to 3-Dimensional space and also because humans are expert at detecting any out-of-sync lip movements with respect to an audio.Approaches to automatically generating natural looking speech animation usually involve manipulating 3D computer generated faces. It was not until recently that highly-realistic facial reenactment was achievable. We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing sequence of natural faces in sync with an input audio track
  9. Keywords:
  10. Deep Learning ; Machine Learning ; Long Short Term Memory (LSTM) ; Speech to Video Mapping ; Conditional Generative Adversarial Networks

 Digital Object List

 Bookmark

No TOC