Video Captioning using Deep Recurrent Neural Networks

Mir Mohammad Sadeghi, Alireza; Soleymani, Mahdieh

Please enable javascript in your browser.

Video Captioning using Deep Recurrent Neural Networks

Mir Mohammad Sadeghi, Alireza | 2017

526 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 51085 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Soleymani, Mahdieh
Abstract:
Solving the visual symbol grounding problem has long been a goal of modern aritificial intelligence. Due to recent breakthroughs in deep learning methods for natural language processing and visual interpretation tasks‚ the field now seems to be as near to achieving this goal as it ever was. Also recent progress in using recurrent neural netowrks (RNNs) for image description‚ has motivated the exploration of their application for video description tasks. However, while images remain static‚ interpreting videos require modeling complex dynamic temporal sturctures and then properly integrating that information into a natural language description. Recurrent neural networks can be both used to encode the input video and also generate the corresponding natural language descriptions. In this work‚ we present a model based on a special recurrent neural network‚ namely the LSTM network‚ that takes into account the inherent hierarchical structure of the videos‚ when generating the video representation. Unlike the traditional encoder-decoder approach‚ in which a video is continously encoded by a recurrent layer‚ we propose a novel LSTM cell‚ that can estimate the probability of confronting a new scene‚ thus forgetting just the right amount of what the cell knows from the past‚ and start generating a new‚ different encoding for the newely encountered scene. We evaluate our approach on three large-scale datasets‚ namely the MSVD‚ M-VAD and MPII-MD datasets and consequently show that our proposed method improves on the existing state of the art methods in some instances and works nearly equaly well on the other ones
Keywords:
Deep Neural Networks ; Recurrent Neural Networks ; Convolutional Neural Network ; Outomatic Description of Images ; Video Captioning

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code