Visual Question Answering

Salari, Arsalan; Manzuri, Mohammad Taghi

Visual Question Answering

, M.Sc. Thesis Sharif University of Technology Salari, Arsalan (Author) ; Manzuri, Mohammad Taghi (Supervisor)

Abstract

Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer(QA) distribution. To address this issue, we introduce a Visually Directed Question Encoder to replace the commonly used RNNs in base models. our method uses visual features alongside word embeddings of question words to encode each word. As a result, the model is forced to look at the visual information relevant to each word and it no longer produces answers based on just the question itself. We evaluate our approach on the VQA generalization task...

Abstract

Visual Question Answering

Cataloging brief

Visual Question Answering

Find in content

Bookmark