Investigating the Mathematical Model of Transformers in Deep Learning Using the Self-Attention Mechanism in Multi-Layer Neural Networks

Please enable javascript in your browser.

Mirjavadi, Mojtaba | 2025

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 58268 (02)
University: Sharif University of Technology
Department: Mathematical Sciences
Advisor(s): Bahraini, Alireza; Shams Yousefi, Marzieh
Abstract:
This dissertation is organized into three chapters. The first chapter presents the necessary concepts and theorems. The second chapter addresses the primary objectives of the study. The third chapter is devoted to the proof and analysis of the convergence theorem for the attention matrix to low-rank Boolean matrices. The objective of this thesis is to investigate the mathematical model of the selfattention mechanism in deep learning, which is examined through the lens of optimal transport theory. It includes two theorems concerning clustering and the convergence of solutions to the differential equation describing the self-attention mechanism.
Keywords:
Deep Learning ; Transformers ; Clustering ; Continuity Equation ; Optimal Transport ; Self-Attention