Loading...
Ebhaam at semeval-2023 task 1: a clip-based approach for comparing cross-modality and unimodality in visual word sense disambiguation
Taghavi, Z ; Sharif University of Technology | 2023
0
Viewed
- Type of Document: Article
- Publisher: Association for Computational Linguistics , 2023
- Abstract:
- This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75% in our best attempt. © 2023 Association for Computational Linguistics
- Keywords:
- Image retrieval ; Learning systems ; Natural language processing systems ; Semantics ; Zero-shot learning
- Source: 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop ; 2023 , Pages 1960-1964 ; 978-195942999-9 (ISBN)
- URL: https://aclanthology.org/2023.semeval-1.269