Loading...

Video Captioning using Deep Recurrent Neural Networks

Mir Mohammad Sadeghi, Alireza | 2017

526 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 51085 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Soleymani, Mahdieh
  7. Abstract:
  8. Solving the visual symbol grounding problem has long been a goal of modern aritificial intelligence. Due to recent breakthroughs in deep learning methods for natural language processing and visual interpretation tasks‚ the field now seems to be as near to achieving this goal as it ever was. Also recent progress in using recurrent neural netowrks (RNNs) for image description‚ has motivated the exploration of their application for video description tasks. However, while images remain static‚ interpreting videos require modeling complex dynamic temporal sturctures and then properly integrating that information into a natural language description. Recurrent neural networks can be both used to encode the input video and also generate the corresponding natural language descriptions. In this work‚ we present a model based on a special recurrent neural network‚ namely the LSTM network‚ that takes into account the inherent hierarchical structure of the videos‚ when generating the video representation. Unlike the traditional encoder-decoder approach‚ in which a video is continously encoded by a recurrent layer‚ we propose a novel LSTM cell‚ that can estimate the probability of confronting a new scene‚ thus forgetting just the right amount of what the cell knows from the past‚ and start generating a new‚ different encoding for the newely encountered scene. We evaluate our approach on three large-scale datasets‚ namely the MSVD‚ M-VAD and MPII-MD datasets and consequently show that our proposed method improves on the existing state of the art methods in some instances and works nearly equaly well on the other ones
  9. Keywords:
  10. Deep Neural Networks ; Recurrent Neural Networks ; Convolutional Neural Network ; Outomatic Description of Images ; Video Captioning

 Digital Object List

 Bookmark

No TOC