Research works

Image Presentation

  • Lip Motion Tracking
  • Face Alignment, Tracking & Modeling
  • Audio-Visual Based Human Detection & Tracking

A probabilistic dynamic contour model is proposed to track the detailed non-rigid shape in cluttered image. The basic idea of probabilistic dynamic contour model is to combine the probabilistic tracking framework and the spatial-temporal active contour model. Instead of searching shape directly from image features, the probabilistic tracking framework provides a multiple-hypothesis way of prediction and measurement. So it is a suitable framework for tracking in clutter image or tracking with little image feature. The active contour model is a suitable way for the extraction of detailed shape. In the probabilistic tracking framework, contour samples are diffused by a new noise model and evolve by the active contour model. The probabilistic dynamic contour model has been applied in the tracking of human outer lip contour and we get the following results: (1) The probabilistic dynamic contour model improves the performance of tracking large inter-frame lip motion compared with existing ICondensation algorithm; (2) It is suitable for tracking the lip contours of different people under different fixed poses.

We propose an Affine Transform Insensitive Initialization Algorithm(ATIIA) to help initialize features points in face alignment algorithm based on Active Shape Model(ASM), where traditional ASM is deficient in model vertices expression, and is prone to local minimum during searching and unideal initialization in condition of affine transformation. Our algorithm overcomes those shortcomings, and can achieve precise facial feature points in multi-view facial images robustly in short time, while conquering local obstruction problem.

We propose to perfomr human tracking based on audio and visual information fusion in complicated and dynamic environments. Firstly audio and visual information are used separately to track the target, then two prior distributions are proposed based on the tracking result. Particles are then sampled from these distributions and their weights are calculated using the audio visual fused observation model.Then the post distribution is calculated and the final tracking result is obtained. Audio and visual information are in a symmetric manner so that they can compensate for each other to a better extent. We also introduce weights to audio and visual information to reflect their reliabilities. Experiments show that our algorithm is more robust than tracking algorithm based only on visual information, and it is also robust to light change, background change and occlusion to some extent. Based on this algorithm a real-time human tracking system is implemented, which can be used in the smart classroom to track the speaker and gather his voice.