DOI QR코드

DOI QR Code

A Survey for 3D Object Detection Algorithms from Images

  • Lee, Han-Lim (Department of IT Engineering, Sookmyung Women's University) ;
  • Kim, Ye-ji (Department of IT Engineering, Sookmyung Women's University) ;
  • Kim, Byung-Gyu (Department of IT Engineering, Sookmyung Women's University)
  • Received : 2022.09.09
  • Accepted : 2022.09.20
  • Published : 2022.09.30

Abstract

Image-based 3D object detection is one of the important and difficult problems in autonomous driving and robotics, and aims to find and represent the location, dimension and orientation of the object of interest. It generates three dimensional (3D) bounding boxes with only 2D images obtained from cameras, so there is no need for devices that provide accurate depth information such as LiDAR or Radar. Image-based methods can be divided into three main categories: monocular, stereo, and multi-view 3D object detection. In this paper, we investigate the recent state-of-the-art models of the above three categories. In the multi-view 3D object detection, which appeared together with the release of the new benchmark datasets, NuScenes and Waymo, we discuss the differences from the existing monocular and stereo methods. Also, we analyze their performance and discuss the advantages and disadvantages of them. Finally, we conclude the remaining challenges and a future direction in this field.

Keywords

Acknowledgement

This research was supported by DNA+Drone Technology Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. NRF-2020M3C1C2A01080819).

References

  1. Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, and B. Gong, et al.,"Polarnet: An improved grid representation for online lidar point clouds semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9601-9610.
  2. X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, and W. Li, et al., "Cylindrical and asymmetrical 3D convolution networks for lidar segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9939-9948.
  3. T. Yin, X. Zhou, and P. Krahenbuhl, "Center-based 3D object detection and tracking," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11784-11793.
  4. X. Zhu, Y. Ma, T. Wang, Y. Xu, J. Shi, and D. Lin, "Ssn: Shape signature networks for multi-class object detection from point clouds," in European Conference on Computer Vision, Springer, 2020, pp. 581-597.
  5. Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, and Q. Chu, "Geometry uncertainty projection network for monocular 3D object detection," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2020, pp. 3111-3121.
  6. X. Liu, N. Xue, and T. Wu, "Learning auxiliary monocular contexts helps monocular 3D object detection," arXiv preprint arXiv:2112.04628, 2021, unpublished.
  7. P. Li, X. Chen, and S. Shen, "Stereo r-cnn based 3D object detection for autonomous driving," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7644-7652.
  8. J. Sun, L. Chen, Y. Xie, S. Zhang, Q. Jiang, X. Zhou, and H. Bao, "Disp r-cnn: Stereo 3D object detection via shape prior guided instance disparity estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10548-10557.
  9. Z. Li, W. Wang, H. Li, E. Xie, C. Sima, and T. Lu, et al., "BEVFormer: Learning bird's-eye-view representation from multi-camera images via spatiotemporal transformers," arXiv preprint arXiv:2203.17270, 2022, unpublished.
  10. Y. Liu, T. Wang, X. Zhang, and J. Sun, "Petr: Position embedding transformation for multi-view 3D object detection," arXiv preprint arXiv:2203.05625, 2022, unpublished.
  11. Y. Jiang, L. Zhang, Z. Miao, X. Zhu, J. Gao, and W. Hu et al., "PolarFormer: Multi-camera 3D object detection with polar transformers," arXiv preprint arXiv:2206. 15398, 2022, unpublished.
  12. E. Arnold, O. Y. Al-Jarrah, M. Dianati, S. Fallah, D. Oxtoby, and A. Mouzakitis, "A survey on 3d object detection methods for autonomous driving applications," IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 10, pp. 3782-3795, 2019. https://doi.org/10.1109/tits.2019.2892405
  13. Z. Li, Y. Du, M. Zhu, S. Zhou, and L. Zhang, "A survey of 3D object detection algorithms for intelligent vehicles development," Artificial Life and Robotics, pp. 1-8, 2021.
  14. C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, "Categorical depth distribution network for monocular 3D object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8555-8564.
  15. Y. Chen, S. Liu, X. Shen, and J. Jia, "Dsgn: Deep stereo geometry network for 3D object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12536-12545.
  16. A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? The kitti vision benchmark suite," in Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE. 2012, pp. 3354-3361.
  17. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, and Q. Xu, et al., "nuscenes: A multimodal dataset for autonomous driving," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11621-11631.
  18. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, and P. Tsui, "Scalability in perception for autonomous driving: Waymo open dataset," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2446-2454.
  19. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko,"End-to-end object detection with transformers," in Proceedings of European Conference on Computer Vision, Springer, 2020, pp. 213-229.
  20. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
  21. Y. Lee, J. W. Hwang, S. Lee, Y. Bae, and J. Park, "An energy and gpu-computation efficient backbone network for real-time object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.
  22. T. Y. Lin, P. Dollar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
  23. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, and Z. Zhang, et al., "Swin transformer: Hierarchical vision transformer using shifted windows," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012-10022.