Design of Multi-Object Tracking Algorithms Based on Transformer Models

Ramezan Dehnavi, Mohammad Amin; Hashemi, Matin

Please enable javascript in your browser.

Design of Multi-Object Tracking Algorithms Based on Transformer Models

Ramezan Dehnavi, Mohammad Amin | 2023

48 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 56326 (05)
University: Sharif University of Technology
Department: Electrical Engineering
Advisor(s): Hashemi, Matin
Abstract:
Nowadays, Multi-Object Tracking (MOT) plays a crucial role in various computer vision applications such as autonomous vehicles, surveillance, and robotics. Traditional MOT methods often struggle with challenges such as high errors when dealing with complex scenarios involving occlusion, scale variations, and object interactions. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs) and Transformer models, have demonstrated significant capabilities in addressing these challenges. This thesis presents a study on the use of deep learning techniques, specifically CNNs and Transformer models, for solving the problem of Multi-Object Tracking. It begins by reviewing the existing literature on MOT methods, including CNN-based approaches as a foundation for object detection, feature extraction, and reidentification task. Then, the principles and architectures of Transformer models, with a focus on models designed for classification and object detection tasks, are examined. Based on the success of Transformer models in computer vision, this thesis explores the capabilities, advantages, and limitations of Transformer models in solving the object tracking problem. Furthermore, to improve the metrics of MOT problem, especially those related to object re-identification and the trade-off between assignment and detection tasks, as well as enhance the robustness of MOT models against camera movements, a spatial-temporal aggregation block is introduced. This block can be added to any MOT model that utilizes Transformer architectures and employs tracking query mechanisms in combination with detection queries. Finally, this thesis investigates each component of the proposed aggregation block and demonstrates their impact on the final performance, showing that incorporating the spatiotemporal aggregation block alongside Transformer models can improve almost all tracking metrics. Additionally, the decomposability of this block creates the flexibility to utilize its different components to enhance the model's performance in various aspects, depending on the application
Keywords:
Computer Vision ; Multiobjective Tracking ; Deep Learning ; Transformer Model ; Pedestrain Tracking ; Spatial-Temporal Aggregation Block ; Convolutional Neural Network

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code