Speech-Driven Facial Reenactment

Jalalifar, Ali; Karbalaei Aghajan, Hamid

Please enable javascript in your browser.

Speech-Driven Facial Reenactment

Jalalifar, Ali | 2018

636 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 51052 (05)
University: Sharif University of Technology
Department: Electrical Engineering
Advisor(s): Karbalaei Aghajan, Hamid
Abstract:
Creating talking heads from audio input is interesting from both scientific and practical viewpoints, e.g. constructing virtual computer generated characters, aiding hearing-impaired people, live dubbing of videos with translated audio, etc. Due to its wide variety of applications, audio to video has been the focus of intensive research in recent years. Mapping audio to facial images with accurate lip-sync is an extremely difficult task because it is a mapping form 2-Dimensional to 3-Dimensional space and also because humans are expert at detecting any out-of-sync lip movements with respect to an audio.Approaches to automatically generating natural looking speech animation usually involve manipulating 3D computer generated faces. It was not until recently that highly-realistic facial reenactment was achievable. We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing sequence of natural faces in sync with an input audio track
Keywords:
Deep Learning ; Machine Learning ; Long Short Term Memory (LSTM) ; Speech to Video Mapping ; Conditional Generative Adversarial Networks

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code