Loading...

Ebhaam at semeval-2023 task 1: a clip-based approach for comparing cross-modality and unimodality in visual word sense disambiguation

Taghavi, Z ; Sharif University of Technology | 2023

0 Viewed
  1. Type of Document: Article
  2. Publisher: Association for Computational Linguistics , 2023
  3. Abstract:
  4. This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75% in our best attempt. © 2023 Association for Computational Linguistics
  5. Keywords:
  6. Image retrieval ; Learning systems ; Natural language processing systems ; Semantics ; Zero-shot learning
  7. Source: 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop ; 2023 , Pages 1960-1964 ; 978-195942999-9 (ISBN)
  8. URL: https://aclanthology.org/2023.semeval-1.269