Visual Question Answering, M.Sc. Thesis Sharif University of Technology ; Manzuri, Mohammad Taghi (Supervisor)
Abstract
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer(QA) distribution. To address this issue, we introduce a Visually Directed Question Encoder to replace the commonly used RNNs in base models. our method uses visual features alongside word embeddings of question words to encode each word. As a result, the model is forced to look at the visual information relevant to each word and it no longer produces answers based on just the question itself. We evaluate our approach on the VQA generalization task...
Cataloging briefVisual Question Answering, M.Sc. Thesis Sharif University of Technology ; Manzuri, Mohammad Taghi (Supervisor)
Abstract
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer(QA) distribution. To address this issue, we introduce a Visually Directed Question Encoder to replace the commonly used RNNs in base models. our method uses visual features alongside word embeddings of question words to encode each word. As a result, the model is forced to look at the visual information relevant to each word and it no longer produces answers based on just the question itself. We evaluate our approach on the VQA generalization task...
Find in contentBookmark
|
|