Loading...

Visual Question Answering

Salari, Arsalan | 2021

545 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 53725 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Manzuri, Mohammad Taghi
  7. Abstract:
  8. Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer(QA) distribution. To address this issue, we introduce a Visually Directed Question Encoder to replace the commonly used RNNs in base models. our method uses visual features alongside word embeddings of question words to encode each word. As a result, the model is forced to look at the visual information relevant to each word and it no longer produces answers based on just the question itself. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a 10.88 percent improvement when using UpDn as the base model
  9. Keywords:
  10. Visual Question Answering ; Bias ; Deep Learning ; Unsupervised Learning

 Digital Object List

 Bookmark

...see more