Loading...
Design of Multi-Object Tracking Algorithms Based on Transformer Models
Ramezan Dehnavi, Mohammad Amin | 2023
48
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 56326 (05)
- University: Sharif University of Technology
- Department: Electrical Engineering
- Advisor(s): Hashemi, Matin
- Abstract:
- Nowadays, Multi-Object Tracking (MOT) plays a crucial role in various computer vision applications such as autonomous vehicles, surveillance, and robotics. Traditional MOT methods often struggle with challenges such as high errors when dealing with complex scenarios involving occlusion, scale variations, and object interactions. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs) and Transformer models, have demonstrated significant capabilities in addressing these challenges. This thesis presents a study on the use of deep learning techniques, specifically CNNs and Transformer models, for solving the problem of Multi-Object Tracking. It begins by reviewing the existing literature on MOT methods, including CNN-based approaches as a foundation for object detection, feature extraction, and reidentification task. Then, the principles and architectures of Transformer models, with a focus on models designed for classification and object detection tasks, are examined. Based on the success of Transformer models in computer vision, this thesis explores the capabilities, advantages, and limitations of Transformer models in solving the object tracking problem. Furthermore, to improve the metrics of MOT problem, especially those related to object re-identification and the trade-off between assignment and detection tasks, as well as enhance the robustness of MOT models against camera movements, a spatial-temporal aggregation block is introduced. This block can be added to any MOT model that utilizes Transformer architectures and employs tracking query mechanisms in combination with detection queries. Finally, this thesis investigates each component of the proposed aggregation block and demonstrates their impact on the final performance, showing that incorporating the spatiotemporal aggregation block alongside Transformer models can improve almost all tracking metrics. Additionally, the decomposability of this block creates the flexibility to utilize its different components to enhance the model's performance in various aspects, depending on the application
- Keywords:
- Computer Vision ; Multiobjective Tracking ; Deep Learning ; Transformer Model ; Pedestrain Tracking ; Spatial-Temporal Aggregation Block ; Convolutional Neural Network
-
محتواي کتاب
- view