Video Analysis and Video Mining:
Web Video Topic Discovery and Tracking
Detection of Near-Duplicates in Web Video Database
Object Detection and Tracking
Introduction to Web Video Topic Discovery and Tracking:
Automatic topic discovery and tracking on web-shared videos can
greatly benefit both web service providers and end users. Most of
current solutions of topic detection and tracking were done on
news and cannot be directly applied on web videos, because the
semantic information of web videos is much less than that of
news videos. In this paper, we propose a bipartite graph model to
address this issue. The bipartite graph represents the correlation
between web videos and their keywords, and automatic topic
discovery is achieved through two steps – coarse topic filtering
and fine topic re-ranking. First, a weight-updating co-clustering
algorithm is employed to filter out topic candidates at a coarse
level. Then the videos on each topic are re-ranked by analyzing
the link structures of the corresponding bipartite graph. After the
topics are discovered, the interesting ones can also be tracked
over a period of time using the same bipartite graph model. The
key is to propagate the relevant scores and keywords from the
videos of interests to other relevant ones through the bipartite
graph links. Experimental results on real web videos from YouKu,
a YouTube counterpart in China, demonstrate the effectiveness of
the proposed methods. We report very promising results.
Introduction to Detection of Near-Duplicates in Web Video Database:
Web video databases tend to contain tremendous nearduplicates
with the explosive growth of online videos. How
to detect and eliminate these near-duplicates has become an
essential problem for web video storage and indexing. In this
paper, we propose a hierarchical approach to solve this problem
efficiently and effectively. For an incoming video, firstly
we compare global features to exclude videos with large
differences. Then pairwise comparison of key-frame level
features with more computation cost is performed. For pairwise
comparison, we adopt the Maximum Matching (MM)
framework on bipartite graph. Instead of using standard wellknown
MM algorithm, a simple approximate algorithm of
MMand a graph cut algorithm are applied for a higher speed.
Experimental results over two databases summing up to more
than 20,000 videos demonstrate our algorithm’s excellent
performance in both speed and accuracy.
Introduction to Object Detection and Tracking:
Multiple Object Tracking (MOT) poses three challenges to conventional well-studied Single Object Tracking (SOT) algorithms: 1) Multiple targets lead the configuration space to be exponential to the number of targets; 2) Multiple motion conditions due to multiple targets' entering, exiting and intersection make the prediction process degrade in precision; 3) Visual ambiguities among nearby targets make the trackers error prone. In this paper, we address the MOT problem by embedding contextual proposal distributions and contextual observation models into a mixture tracker which is implemented in a Particle Filter framework. The proposal distributions are adaptively selected by motion conditions of targets which are determined by context information, and the multiple features are combined according to their discriminative power between ambiguity prone objects. The induction of contextual proposal distribution and observation model can help to surmount the incapability of conventional mixture tracker in handling object occlusions, meanwhile retain its merits of flexibility and high efficiency. The final experiments show significant improvement in variable number objects tracking scenarios compared with other methods.
Content about Web Video Topic Discovery and Tracking:
The topic discovery
is achieved by two steps – coarse topic filtering and fine topic reranking.
First, the information-theoretic co-clustering is
employed to filter web video topics at a coarse level. This is an
unsupervised algorithm which utilizes the co-occurrence table of
the two modules in the bipartite graph and obtains the clusters of
videos and keywords simultaneously. In order to reduce noisy
keywords’ influence, we propose a weight updating strategy,
which assigns each keyword a weight to reflect its impact on the
co-occurrence table and updates the weights iteratively based on
the videos’ clusters information. Then, the videos on the
discovered topics are re-ranked by analyzing the bipartite graph’s
link structures, which can be implemented as an iterative
reinforcement process. The re-ranking step can be treated as a fine
topic filtering step, because based on the re-ranking results,
websites organizers can recommend the top N videos to customers
and remove the videos with the least relevance. After the topics are discovered, the interesting ones can also be
tracked over a period of time using the same bipartite graph
model. The basic idea is to propagate the relevant scores from
pre-defined videos and keywords to other relevant ones through
the bipartite graph’s links, which can be also achieved by an
iterative reinforcement process. After convergence, the relevant
videos will be ranked higher than irrelevant ones.
The framework of topic discovery and tracking by bipartite graph model
Content about Detection of Near-Duplicates in Web Video Database:
We propose a hierarchical approach to solve
this problem efficiently and effectively. First we acquire a
hash code representation for each key-frame, based
on which global features are produced and used to exclude
the different videos coarsely. Then a bipartite graph is constructed
whose edges reflect similarity relationships between
key-frames on which we utilize the Maximum Matching
(MM) algorithm to estimate the similarity of two videos. To
speed up in this step, an approximate algorithm of MM is
firstly used to classify candidate videos as near-duplicates or
novel ones quickly. And for the videos remained, which are
not determined in above steps, we modify the bipartite graph
by adding two nodes and related edges to convert the MM
problems to a max flow / min cut problem in order to solve it
efficiently using graph cut algorithm.
Flow chart for feature comparison
Content about Object Detection and Tracking:
The joint trackers and mixture trackers are two extreme instantiations
of the tradeoff between dependency and efficiency.
In this paper, we address such tradeoff by introducing contexts
into mixture trackers, which is named as contextual mixture
tracker (CMT). In CMT, the evolvement of each local tracker
only depends on its context. The context is constituted by only
a fraction of targets and some scene priors.
More specifically, the CMT is implemented in Sequential
Monte Carlo framework (also known as particle filter), which
consists of two steps: 1) predict the next motion state based on
the current state according to the proposal distribution; and 2)
verify the predictions based on new observed data using the
observation model. We embed the contexts into these two steps
as follows to form the contextual proposal distribution (CPD)
and contextual observation model (COM):
1) Contextual proposal distribution: CPD is a cascaded sampling
function, where the particles for motion conditions are first
sampled according to its current state and context, and then the
particles for motion states are sampled by a motion condition
specific proposal distribution. In this step, only spatial contexts
are used, including the distances between the local trackers/
scene boundaries, and the states of nearby local trackers. These contexts can help to explicitly predict the motion conditions, and further improve the motion prediction precision.
2) Contextual observation model: The objective of the observation
model is to evaluate the prediction likelihoods given new
observation data. It is especially challenging when there are visually
ambiguous targets nearby. In order to differentiate the
local target from them, we propose to use spatial-temporal contexts
to adaptively construct the observation models. First, the
spatial context is used to find nearby targets. Then the historical
appearances of these targets and the local target (temporal
contexts) are used to evaluate the discriminative power of the
features by an mutual information based feature ranking likelihood
method. More discriminative features are weighted higher
in the observation model and vice versa. This guarantees that
COM is continuously adapted to different contexts and can effectively
discriminate the local target from variable surrounding
Motion condition specific particle propose. The four top-down layers
correspond respectively to weighted particles at time t-1, resampled particles,
sampled motion conditions, and proposed particles. The five nodes in motion
condition layer represent Split, Merge, Normal update, Exit, and Enter.
Projects with sponsor:
Intel合作项目“视频信息挖掘系统中性能评测与提升”。(2005/05 – 今)
自然科学基金项目“面向视频结构分析和事件检测的非监督模式挖掘技术”。(2005/03 – 今)
863计划项目“MPEG-4和MPEG-7关键技术的研究”。(2004/10 – 今)
中德博士生合作研究项目“自然与人工认知系统中的模式交相感应”。(2004/06 – 今)
973计划项目“海量信息的组织、管理、实现机制及其在数字图书馆中的应用”。(2001/10 – 2004/10)
Lifeng Sun, Peng Wang, Fei Wang, Liu lu, Chao Wang, Cui Peng
 Peng Cui, Li-Feng Sun, Fei Wang, Shi-Qiang Yang, Contextual Mixture Tracking, IEEE Transactions on Multimedia, vol. 11, no. 2, pp. 335-344, Feb. 2009.
 Peng CUI, Zhiqiang LIU, Lifeng SUN, Shiqiang YANG, Hierarchical visual event pattern mining and its applications, Int. J. Data Mining and Knowledge Discovery, (DMKD, minor revision)
 P. Wang, R. Cai, and S.-Q. Yang, Improving Classification of Video Shots Using Information-Theoretic Co-Clustering, 2005 IEEE International Symposium on Circuits and Systems (ISCAS 2005), Kobe, Japan, 23-26, May, 2005
 P. Wang, R. Cai, and S.-Q. Yang, A tennis video indexing approach through pattern discovery in interactive process, in Proc. IEEE PCM04, LNCS, vol. 3331, Tokyo, Japan, Dec. 2004, pp. 49-56.
 P. WANG, R. CAI, S.-Q. YANG, Tennis Video Analysis based on Transformed Motion Vectors, 2004 International Conference on Image and Video Retrieval (CIVR 2004), LNCS vol. 3115, Dublin City University, Ireland, Jul. 21-23, 2004, pp. 79 – 87.
 P. Wang, R. Cai, and S.-Q. Yang, Contextual Browsing for Highlights in Sports Video, the 2004 IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, Jun. 27-30, 2004.
 P. Wang, R. Cai, and S.-Q. Yang, A Pinhole Camera Modeling of Motion Vector Field for Tennis Video Analysis, IEEE International Conference on Image Processing (ICIP 2004), Singapore, Oct. 24-27, 2004.
 P. Wang, Y.-F. Ma, H.-J Zhang and S.-Q. Yang, A People-Similarity Based Approach To Video Indexing, IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2003), Volume:3, Pages:693-697, Hong Kong, Apr. 6-10, 2003
 G. Xu, Y.-F. Ma, H.-J. Zhang, S.-Q. Yang, A HMM Based Semantic Analysis Framework for Sports Game Event Detection, IEEE Int. Conf. on Image Processing, 2003
 G. Xu, Y.-F. Ma, H.-J. Zhang, and S.-Q. Yang, Motion based event recognition using HMM, Proc. of the 16th International Conference on Pattern Recognition, Quebec, Canada, Vol.2, pp.831-834, Aug. 11-15, 2002
 C.Wu, Y.W.He, Li.Zhao, Y.Z.Zhong, Motion Feature Extraction Scheme for Content-based Video Retrieval, SPIE Electronic Imaging West, 2002, San Jose, USA.
 L.Zhao, S.Q. Yang, etc., Content-based Retrieval of Video Shot Using the Improved Nearest Feature Line Method，IEEE International Conference on Acoustic, Speech and Signal Processing(ICASSP), May, 2001, Salt Lake City, USA.
Demos and Resources: