Loading...
Investigating the Mathematical Model of Transformers in Deep Learning Using the Self-Attention Mechanism in Multi-Layer Neural Networks
Mirjavadi, Mojtaba | 2025
0
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 58268 (02)
- University: Sharif University of Technology
- Department: Mathematical Sciences
- Advisor(s): Bahraini, Alireza; Shams Yousefi, Marzieh
- Abstract:
- This dissertation is organized into three chapters. The first chapter presents the necessary concepts and theorems. The second chapter addresses the primary objectives of the study. The third chapter is devoted to the proof and analysis of the convergence theorem for the attention matrix to low-rank Boolean matrices. The objective of this thesis is to investigate the mathematical model of the selfattention mechanism in deep learning, which is examined through the lens of optimal transport theory. It includes two theorems concerning clustering and the convergence of solutions to the differential equation describing the self-attention mechanism.
- Keywords:
- Deep Learning ; Transformers ; Clustering ; Continuity Equation ; Optimal Transport ; Self-Attention
-
محتواي کتاب
- view
