Loading...

Effective fusion of deep multitasking representations for robust visual tracking

Marvasti Zadeh, S. M ; Sharif University of Technology | 2022

117 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s00371-021-02304-1
  3. Publisher: Springer Science and Business Media Deutschland GmbH , 2022
  4. Abstract:
  5. Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This paper aims to evaluate the performance of 12 state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128, UAV-123 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature
  6. Keywords:
  7. Appearance modeling ; Deep convolutional neural networks ; Discriminative correlation filters ; Aircraft detection ; Birds ; Convolution ; Convolutional neural networks ; Deep neural networks ; Learning systems ; Semantic Segmentation ; Wetlands ; Appearance models ; Convolutional neural network ; Correlation filters ; Deep convolutional neural network ; Discriminative correlation filter ; Feature map ; Features extraction ; Learning process ; Robust visual tracking ; Visual Tracking ; Semantics
  8. Source: Visual Computer ; Volume 38, Issue 12 , 2022 , Pages 4397-4417 ; 01782789 (ISSN)
  9. URL: https://link.springer.com/article/10.1007/s00371-021-02304-1