Human-Robot Interaction in Real Environments by Audio-Visual Integration

  • Kim, Hyun-Don (Intelligent Robotics Research Center at KIST) ;
  • Choi, Jong-Suk (Intelligent Robotics Research Center at KIST) ;
  • Kim, Mun-Sang (Center for Intelligent Robotics, Frontier 21 Program at KIST)
  • Published : 2007.02.28

Abstract

In this paper, we developed not only a reliable sound localization system including a VAD(Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate three systems in the human-robot interaction to compensate errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audio-visual system in a prototype robot, called IROBAA(Intelligent ROBot for Active Audition), and demonstrated how to integrate the audio-visual system.

Keywords

References

  1. J. Huang, N. Ohnishi, and N. Sugie, 'A biomimetic system for localization and separation of multiple sound sources,' Proc. of IEEE/IMTC Int. Conf. Instrumentation and Measurement Technology, Hamamatsu Japan, pp. 967-970, May 1994
  2. J. Huang, N. Ohnishi, and N. Sugie, 'Sound localization in reverberant environment based on the model of the precedence effect,' IEEE Trans. on Instrumentation and Measurement, vol. 46, no. 4, pp. 842-846, 1997 https://doi.org/10.1109/19.650785
  3. J. Huang, T. Supaongprapa, I. Terakura, N. Ohnishi, and N. Sugie, 'Mobile robot and sound localization,' Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Grenoble France, pp. 683-689, Sep. 1997
  4. J. Huang, N. Ohnishi, and N. Sugie, 'Spatial localization of sound sources: azimuth and elevation estimation,' Proc. of IEEE/IMTC Int. Conf. Instrument-ation and Measurement Technology, St. Paul, MN USA, pp. 330-333, May 1998
  5. J. Huang, K. Kume, and A. Saji, 'Robotics spatial sound localization and its 3d sound human interface,' Proc. of IEEE Int. Sym. Cyber Worlds, pp. 191-197, 2002
  6. H. D. Kim, J. S. Choi, C. H. Lee, and M. S. Kim, 'Reliable detection of sound's direction for human robot interaction,' Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Sendai Japan, pp. 2411-2416, Sep. 2004
  7. H. G.. Okuno, K. Nakadai, K. Hidai, H. Mizoguchi, and H. Kitano, 'Human-robot interaction through real-time auditory and visual multiple-talker tracking,' Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Hawaii, USA, pp. 1402-1409, Oct. 2001
  8. K. Nakadai, K. Hidai, H. G. Okuno, and H. Kitano, 'Real-time speaker localization and speech separation by audio-visual integration,' Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Washington, DC, USA, pp. 1043-1049, May 2002
  9. H. Kobayashi and T. Shimamura, 'A modified cepstrum method for pitch extraction,' Proc. of IEEE/APCCAS Int. Conf. Circuits and Systems, pp. 299-302, Nov. 1988
  10. S. Ahmadi and A. S. Spanias, 'Cepstrum-based detection using a new statistical V/UV classification algorithm,' IEEE Trans. on Speech and Audio Processing, vol. 7, no. 3, pp. 333-338, 1999 https://doi.org/10.1109/89.759042
  11. R. Y. Tsai, 'A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses,' IEEE Journal of Robotics and Automation, vol. 3, no. 4, pp. 323-344, 1987 https://doi.org/10.1109/JRA.1987.1087109
  12. I. Hara, F. Asano, Y. Kawai, F. Kanehiro, and K. Yamamoto, 'Robust speech interface based on audio and video information fusion for humanoid HRP-2,' Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Sendai, Japan, pp. 2404-2410, Sep. 2004