Description
Date depot: 1 janvier 1900
Titre: Face Tracking and Indexing in Video Sequences
Directeur de thèse:
Fakhreddine ABABSA ()
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
The development of robust face processing algorithms capable of functioning in unconstrained,
real-world environments will have far-reaching applications in our modern digital world. A lot of
research is being done in the generic field of face processing, that comprises specific applications
such as face detection, tracking, recognition, identification, verification, pair matching,
reconstruction, etc. There are a lot of common steps for the huge variety of algorithms being
proposed. At the core of this challenge is the extreme diversity of viewpoint, lighting, clutter,
occlusion, environment, etc. With the rapid technological developments, faces appear also not
only in static images but also in videos, making the problem even more challenging.
Head pose tracking [ybanez-cvpr-2007] is an important issue in the tracking community and has
received an important attention in the last decade because the multiple applications involved
[murphy-pami-2008]. These applications include video surveillance, human–computer interface,
biometrics, etc. However, face tracking is still a major challenge due, in part, to nonrigid motion,
appearance variations, illumination changes, and occlusions.
For tracking objects in video sequences, observations take place sequentially, and predictions are
determined as soon as image frames arrive. Recently, sequential Monte Carlo (SMC) techniques
[doucet-springer-2001] have received considerable attention because of their ability to escape
from local minima and their applicability to non-Gaussian data. A condensation algorithm, also
known as bootstrap filter, is an example of an SMC method. On the other hand from the pioneering work of Cootes and Edwards [cootes-pami-2001], active appearance models (AAMs)
for deformable objects provide an estimation of the parameters that adjust the model to match
closely the current video frame. The evaluation function to minimize, using typically a Gauss–
Newton method, may be viewed as a confidence level.
For the above mentioned applications we are also facing the usual problems of labeled databases.
They are needed to train various models, such as active appearance models (AAMs). Labeled
databases are also needed in order to evaluate the real contributions of the multitude of newly
proposed algorithms, including large scale machine learning methods, such as [sonnenburg-jmlr-
2006]. The reported evaluation results often seem to outperform the existing “baseline” systems
and are in the majority of the cases evaluated on single databases. It is difficult to say if their
good performance is also due to some biases related to the evaluation database. Eventhough,
publicly available databases are becoming bigger and bigger, new datasets are still needed. Pinto
et al. [pinto-cvpr-2009] propose to use in addition synthetic evaluation databases.
Recent results reported by Zhou et al. [zhou-icpr-2010], show that the face analysis and synthesis
framework initially proposed by Blanz and Vetter [blanz-siggraph-1999] is also useful to
generate synthetic 3D faces with automatically generated landmarks that can be exploited to train
a landmark detection algorithm. Usually for the learning phase manually annotated landmarks
are used. Therefore this analysis and synthesis framework with a 3D Morphable Model (3DMM)
is a good candidate to generate synthetic data for learning methods (including head pose
estimation methods) and also for evaluation purposes.
In the first part of the study, the student will start with commonly used Bayesian approaches that
consist of combining a dynamical moving model (prediction) and a « likelihood » measure
(observation) performed in the novel image (incoming frame) using an appearance model. A part
of this study will be in relation with the ANR project ORIGAMI where we are concerned by
tracking the customer’s gaze in a supermarket, more precisely by the head pose because, in the
considered situation, we have no access to the iris location. In this application we are facing the
usual problems of availability of relevant labelled databases. We are going to exploit results
obtained in [zhou-icpr-2010] in order to synthesize useful database for training and testing.
The head pose estimation and tracking will be furthermore integrated in newly proposed robust
face processing algorithms capable of functioning in unconstrained, real-world environments,
and exploiting the temporal information that can be found in video sequences.
Doctorant.e: Tran Ngoc Trung