Ebhaam at semeval-2023 task 1: a clip-based approach for comparing cross-modality and unimodality in visual word sense disambiguation

Taghavi, Z; Haghighi Naeini, P Sadraei Javaheri, M. A Gooran, S Asgari, E Rabiee, H. R Sameti, H Ojha A. K Dogruoz A. S Da San Martino G Madabushi H. T Kumar R Sartori E Sharif University of Technology

Please enable javascript in your browser.

Ebhaam at semeval-2023 task 1: a clip-based approach for comparing cross-modality and unimodality in visual word sense disambiguation

Taghavi, Z ; Sharif University of Technology | 2023

0 Viewed

Type of Document: Article
Publisher: Association for Computational Linguistics , 2023
Abstract:
This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75% in our best attempt. © 2023 Association for Computational Linguistics
Keywords:
Image retrieval ; Learning systems ; Natural language processing systems ; Semantics ; Zero-shot learning
Source: 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop ; 2023 , Pages 1960-1964 ; 978-195942999-9 (ISBN)
URL: https://aclanthology.org/2023.semeval-1.269

Friend's email
Your name
Your email
enter code