Current location: Home -> Research -> Video Analysis and Video Mining

Video Analysis and Video Mining:

Web Video Topic Discovery and Tracking
Detection of Near-Duplicates in Web Video Database
Object Detection and Tracking

Introduction to Web Video Topic Discovery and Tracking:

  Automatic topic discovery and tracking on web-shared videos can greatly benefit both web service providers and end users. Most of current solutions of topic detection and tracking were done on news and cannot be directly applied on web videos, because the semantic information of web videos is much less than that of news videos. In this paper, we propose a bipartite graph model to address this issue. The bipartite graph represents the correlation between web videos and their keywords, and automatic topic discovery is achieved through two steps – coarse topic filtering and fine topic re-ranking. First, a weight-updating co-clustering algorithm is employed to filter out topic candidates at a coarse level. Then the videos on each topic are re-ranked by analyzing the link structures of the corresponding bipartite graph. After the topics are discovered, the interesting ones can also be tracked over a period of time using the same bipartite graph model. The key is to propagate the relevant scores and keywords from the videos of interests to other relevant ones through the bipartite graph links. Experimental results on real web videos from YouKu, a YouTube counterpart in China, demonstrate the effectiveness of the proposed methods. We report very promising results.

Introduction to Detection of Near-Duplicates in Web Video Database:

  Web video databases tend to contain tremendous nearduplicates with the explosive growth of online videos. How to detect and eliminate these near-duplicates has become an essential problem for web video storage and indexing. In this paper, we propose a hierarchical approach to solve this problem efficiently and effectively. For an incoming video, firstly we compare global features to exclude videos with large differences. Then pairwise comparison of key-frame level features with more computation cost is performed. For pairwise comparison, we adopt the Maximum Matching (MM) framework on bipartite graph. Instead of using standard wellknown MM algorithm, a simple approximate algorithm of MMand a graph cut algorithm are applied for a higher speed. Experimental results over two databases summing up to more than 20,000 videos demonstrate our algorithm’s excellent performance in both speed and accuracy.

Introduction to Object Detection and Tracking:

  Multiple Object Tracking (MOT) poses three challenges to conventional well-studied Single Object Tracking (SOT) algorithms: 1) Multiple targets lead the configuration space to be exponential to the number of targets; 2) Multiple motion conditions due to multiple targets' entering, exiting and intersection make the prediction process degrade in precision; 3) Visual ambiguities among nearby targets make the trackers error prone. In this paper, we address the MOT problem by embedding contextual proposal distributions and contextual observation models into a mixture tracker which is implemented in a Particle Filter framework. The proposal distributions are adaptively selected by motion conditions of targets which are determined by context information, and the multiple features are combined according to their discriminative power between ambiguity prone objects. The induction of contextual proposal distribution and observation model can help to surmount the incapability of conventional mixture tracker in handling object occlusions, meanwhile retain its merits of flexibility and high efficiency. The final experiments show significant improvement in variable number objects tracking scenarios compared with other methods.

Content about Web Video Topic Discovery and Tracking:

  The topic discovery is achieved by two steps – coarse topic filtering and fine topic reranking. First, the information-theoretic co-clustering is employed to filter web video topics at a coarse level. This is an unsupervised algorithm which utilizes the co-occurrence table of the two modules in the bipartite graph and obtains the clusters of videos and keywords simultaneously. In order to reduce noisy keywords’ influence, we propose a weight updating strategy, which assigns each keyword a weight to reflect its impact on the co-occurrence table and updates the weights iteratively based on the videos’ clusters information. Then, the videos on the discovered topics are re-ranked by analyzing the bipartite graph’s link structures, which can be implemented as an iterative reinforcement process. The re-ranking step can be treated as a fine topic filtering step, because based on the re-ranking results, websites organizers can recommend the top N videos to customers and remove the videos with the least relevance. After the topics are discovered, the interesting ones can also be tracked over a period of time using the same bipartite graph model. The basic idea is to propagate the relevant scores from pre-defined videos and keywords to other relevant ones through the bipartite graph’s links, which can be also achieved by an iterative reinforcement process. After convergence, the relevant videos will be ranked higher than irrelevant ones.

The framework of topic discovery and tracking by bipartite graph model

Content about Detection of Near-Duplicates in Web Video Database:

  We propose a hierarchical approach to solve this problem efficiently and effectively. First we acquire a hash code representation for each key-frame, based on which global features are produced and used to exclude the different videos coarsely. Then a bipartite graph is constructed whose edges reflect similarity relationships between key-frames on which we utilize the Maximum Matching (MM) algorithm to estimate the similarity of two videos. To speed up in this step, an approximate algorithm of MM is firstly used to classify candidate videos as near-duplicates or novel ones quickly. And for the videos remained, which are not determined in above steps, we modify the bipartite graph by adding two nodes and related edges to convert the MM problems to a max flow / min cut problem in order to solve it efficiently using graph cut algorithm.

Flow chart for feature comparison

Content about Object Detection and Tracking:

  The joint trackers and mixture trackers are two extreme instantiations of the tradeoff between dependency and efficiency. In this paper, we address such tradeoff by introducing contexts into mixture trackers, which is named as contextual mixture tracker (CMT). In CMT, the evolvement of each local tracker only depends on its context. The context is constituted by only a fraction of targets and some scene priors. More specifically, the CMT is implemented in Sequential Monte Carlo framework (also known as particle filter), which consists of two steps: 1) predict the next motion state based on the current state according to the proposal distribution; and 2) verify the predictions based on new observed data using the observation model. We embed the contexts into these two steps as follows to form the contextual proposal distribution (CPD) and contextual observation model (COM): 1) Contextual proposal distribution: CPD is a cascaded sampling function, where the particles for motion conditions are first sampled according to its current state and context, and then the particles for motion states are sampled by a motion condition specific proposal distribution. In this step, only spatial contexts are used, including the distances between the local trackers/ scene boundaries, and the states of nearby local trackers. These contexts can help to explicitly predict the motion conditions, and further improve the motion prediction precision. 2) Contextual observation model: The objective of the observation model is to evaluate the prediction likelihoods given new observation data. It is especially challenging when there are visually ambiguous targets nearby. In order to differentiate the local target from them, we propose to use spatial-temporal contexts to adaptively construct the observation models. First, the spatial context is used to find nearby targets. Then the historical appearances of these targets and the local target (temporal contexts) are used to evaluate the discriminative power of the features by an mutual information based feature ranking likelihood method. More discriminative features are weighted higher in the observation model and vice versa. This guarantees that COM is continuously adapted to different contexts and can effectively discriminate the local target from variable surrounding targets.

Motion condition specific particle propose. The four top-down layers correspond respectively to weighted particles at time t-1, resampled particles, sampled motion conditions, and proposed particles. The five nodes in motion condition layer represent Split, Merge, Normal update, Exit, and Enter.

Projects with sponsor:

Intel合作项目“视频信息挖掘系统中性能评测与提升”。(2005/05 – 今)
自然科学基金项目“面向视频结构分析和事件检测的非监督模式挖掘技术”。(2005/03 – 今)
863计划项目“MPEG-4和MPEG-7关键技术的研究”。(2004/10 – 今)
中德博士生合作研究项目“自然与人工认知系统中的模式交相感应”。(2004/06 – 今)
973计划项目“海量信息的组织、管理、实现机制及其在数字图书馆中的应用”。(2001/10 – 2004/10)


Lifeng Sun, Peng Wang, Fei Wang, Liu lu, Chao Wang, Cui Peng


[1] Peng Cui, Li-Feng Sun, Fei Wang, Shi-Qiang Yang, Contextual Mixture Tracking, IEEE Transactions on Multimedia, vol. 11, no. 2, pp. 335-344, Feb. 2009.
[2] Peng CUI, Zhiqiang LIU, Lifeng SUN, Shiqiang YANG, Hierarchical visual event pattern mining and its applications, Int. J. Data Mining and Knowledge Discovery, (DMKD, minor revision)
[3] P. Wang, R. Cai, and S.-Q. Yang, Improving Classification of Video Shots Using Information-Theoretic Co-Clustering, 2005 IEEE International Symposium on Circuits and Systems (ISCAS 2005), Kobe, Japan, 23-26, May, 2005
[4] P. Wang, R. Cai, and S.-Q. Yang, A tennis video indexing approach through pattern discovery in interactive process, in Proc. IEEE PCM04, LNCS, vol. 3331, Tokyo, Japan, Dec. 2004, pp. 49-56.
[5] P. WANG, R. CAI, S.-Q. YANG, Tennis Video Analysis based on Transformed Motion Vectors, 2004 International Conference on Image and Video Retrieval (CIVR 2004), LNCS vol. 3115, Dublin City University, Ireland, Jul. 21-23, 2004, pp. 79 – 87.
[6] P. Wang, R. Cai, and S.-Q. Yang, Contextual Browsing for Highlights in Sports Video, the 2004 IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, Jun. 27-30, 2004.
[7] P. Wang, R. Cai, and S.-Q. Yang, A Pinhole Camera Modeling of Motion Vector Field for Tennis Video Analysis, IEEE International Conference on Image Processing (ICIP 2004), Singapore, Oct. 24-27, 2004.
[8] P. Wang, Y.-F. Ma, H.-J Zhang and S.-Q. Yang, A People-Similarity Based Approach To Video Indexing, IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2003), Volume:3, Pages:693-697, Hong Kong, Apr. 6-10, 2003
[9] G. Xu, Y.-F. Ma, H.-J. Zhang, S.-Q. Yang, A HMM Based Semantic Analysis Framework for Sports Game Event Detection, IEEE Int. Conf. on Image Processing, 2003
[10] G. Xu, Y.-F. Ma, H.-J. Zhang, and S.-Q. Yang, Motion based event recognition using HMM, Proc. of the 16th International Conference on Pattern Recognition, Quebec, Canada, Vol.2, pp.831-834, Aug. 11-15, 2002
[11] C.Wu, Y.W.He, Li.Zhao, Y.Z.Zhong, Motion Feature Extraction Scheme for Content-based Video Retrieval, SPIE Electronic Imaging West, 2002, San Jose, USA.
[12] L.Zhao, S.Q. Yang, etc., Content-based Retrieval of Video Shot Using the Improved Nearest Feature Line Method,IEEE International Conference on Acoustic, Speech and Signal Processing(ICASSP), May, 2001, Salt Lake City, USA.

Demos and Resources:


Links: Tsinghua - DCST - HCI&MI

Any questions or comments about our web site please Contact Us
Copyrights (C) 2005 Multimedia Group, Dept. of CS, Tsinhua University.
All rights reserved.