Home Abstract Video Code&Dataset Paper Acknowledgements


A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer’s actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform. The object association leverages quasi-dense similarity learning to identify objects in various poses and viewpoints with appearance cues only. After initial 2D association, we further utilize 3D bounding boxes depth-ordering heuristics for robust instance association and motion-based 3D trajectory prediction for re-identification of occluded vehicles. In the end, an LSTM-based object velocity learning module aggregates the long-term trajectory information for more accurate motion extrapolation. Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios. On the Waymo Open benchmark, we establish the first camera-only baseline in the 3D tracking and 3D detection challenges. Our quasi-dense 3D tracking pipeline achieves impressive improvements on the nuScenes 3D tracking benchmark with near five times tracking accuracy of the best vision-only submission among all published methods.

Video Overview

Code & Dataset

Monocular Quasi-Dense 3D Object Tracking

Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun
  author = {Hu, Hou-Ning and Yang, Yung-Hsu and Fischer, Tobias and Yu, Fisher and Darrell, Trevor and Sun, Min},
  title = {Monocular Quasi-Dense 3D Object Tracking},
  journal = {ArXiv:2103.07351},
  year = {2021}

ICCV 2019: Joint Monocular 3D Vehicle Detection and Tracking

Hou-Ning Hu, Qizhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
  author = {Hu, Hou-Ning and Cai, Qi-Zhi and Wang, Dequan
  and Lin, Ji and Sun, Min and Krähenbühl, Philipp and
  Darrell, Trevor and Yu, Fisher},
  title = {Joint Monocular 3D Vehicle Detection and Tracking},
  booktitle = {IEEE International Conference on Computer Vision (ICCV)},
  year = {2019}


PhD Fellowship Program