Effective fusion of deep multitasking representations for robust visual tracking

Marvasti Zadeh, S. M; Ghanei Yakhdan, H Kasaei, S Nasrollahi, K Moeslund, T. B Sharif University of Technology

Please enable javascript in your browser.

Effective fusion of deep multitasking representations for robust visual tracking

Marvasti Zadeh, S. M ; Sharif University of Technology | 2021

304 Viewed

Type of Document: Article
DOI: 10.1007/s00371-021-02304-1
Publisher: Springer Science and Business Media Deutschland GmbH , 2021
Abstract:
Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This paper aims to evaluate the performance of 12 state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128, UAV-123 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature
Keywords:
Aircraft detection ; Convolution ; Convolutional neural networks ; Semantics ; Wetlands ; Appearance models ; Correlation filters ; Discriminative correlation filter ; Feature map ; Features extraction ; Filter-based ; Learning process ; Robust visual tracking ; Visual object tracking ; Visual Tracking ; Deep neural networks
Source: Visual Computer ; 2021 ; 01782789 (ISSN)
URL: https://link.springer.com/article/10.1007/s00371-021-02304-1

Friend's email
Your name
Your email
enter code