Loading...
				
	
				
	
								
					
				
				
	
				
															
Multi-modal Keyword Extraction from Video Clip and its Description
Alizadeh Aghmashhadi, Farahmand | 2024
				
											0
									
				Viewed
			
		- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 57461 (05)
- University: Sharif University of Technology
- Department: Electrical Engineering
- Advisor(s): Behroozi, Hamid; Asgari, Ehsaneddin
- Abstract:
- The task of keyword prediction has been widely used in the field of natural language processing from past to present. In the past, keyword prediction was primarily performed on textual content. However, with the rapid growth of multimedia content and its use in social networks, the need for extracting and automatically generating appropriate keywords for videos has also increased. The use of suitable keywords significantly impacts content accessibility, visibility, and better classification. With the expansion of generative models, the keyword estimation problem can also be formulated as a text generation task. The proposed solutions have often focused on English-language content, and they usually perform poorly or are unusable in Persian.Therefore, a dataset of short Persian-language videos has been collected and created. It includes short video clips along with related titles, descriptions, tags, keyframes, descriptions of each frame, audio content, and corresponding transcripts. For modeling the problem, open-source multimodal models, including PaliGemma, IDEFICS, and Qwen2-VL, have been used in two modes: zeroshot and LoRA-Fintuned, on the created dataset. Evaluations are usually based on Exact matching at the lexical level. In this research, efforts have been made to evaluate the models from three aspects: reference agreement, diversity, and faithfullness, at both lexical and semantic levels. For the test set, several videos were manually labeled, and based on that, the performance of the models was evaluated. The results of this research show that the fine-tuned models are capable of generating relatively suitable Persian keywords using the information available in the text and images. These models can be used to improve search and content retrieval, and be employed in recommendation systems, or in keyword generation applications for content creators
- Keywords:
- Keyword Extaction ; Natural Language Processing ; Machine Learning ; Keyword Generation
- 
	        		
	        		 محتواي کتاب محتواي کتاب
- view
 
		
 Digital Object List
 Digital Object List
         Bookmark
 Bookmark