Loading...

Non-End-to-End Sign Language Translation with Large Language Models

Kamali, Kasra | 2024

0 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 57070 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Hajsadeghi, Khosrow
  7. Abstract:
  8. In this thesis, we study the task of sign language translation using a non-end-to-end framework by utilizing large language models. The proposed method initially generates a sequence of glosses corresponding to a sign language video through a continuous sign language recognition model, which is then considered as input for a language model. The role of a language model in this framework is to translate the sequence of glosses into coherent sentences in spoken language. Given the limited training resources available for machine translation models on the specific problem of sign language to text translation, leveraging large language models can be highly beneficial due to the accumulated knowledge gained during their pre-training process. Considering the current popularity of instruction-following large language models, our research investigates the impact of fine-tuning on instructions to improve the performance of a language model on the task of gloss to text translation. In this context, the problem of gloss to text translation is formulated as a set of instructions, and several large language models with different architectures and pre-training methods are evaluated for translating sign language glosses to spoken language sentences. By incorporating the T5 language model into our framework, our method surpasses previous top-performing solutions by 1.16 and 2.35 BLEU-4 scores for the G2T and S2G → G2T evaluation protocols on the Phoenix dataset, respectively, and achieves near state-of-the-art results on the S2G2T protocol. Furthermore, our method outperforms previous best solutions by a significant margin of 4.47 BLEU-4 score on the ASLG dataset
  9. Keywords:
  10. Large Language Model ; Continuous Signs Recognition ; Sign Language ; Sign Language Recognition ; Instruction Fine-Tuning ; Non-End-To-End Framework ; Sign Language Translation

 Digital Object List

 Bookmark

...see more