Visual Question Answering

Please enable javascript in your browser.

Salari, Arsalan | 2021

545 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 53725 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Manzuri, Mohammad Taghi
Abstract:
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer(QA) distribution. To address this issue, we introduce a Visually Directed Question Encoder to replace the commonly used RNNs in base models. our method uses visual features alongside word embeddings of question words to encode each word. As a result, the model is forced to look at the visual information relevant to each word and it no longer produces answers based on just the question itself. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a 10.88 percent improvement when using UpDn as the base model
Keywords:
Visual Question Answering ; Bias ; Deep Learning ; Unsupervised Learning