Projet de recherche doctoral numero :3919

Description

Date depot: 1 janvier 1900
Titre: Face Tracking and Indexing in Video Sequences
Directeur de thèse: Fakhreddine ABABSA ()
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini

Resumé: The development of robust face processing algorithms capable of functioning in unconstrained, real-world environments will have far-reaching applications in our modern digital world. A lot of research is being done in the generic field of face processing, that comprises specific applications such as face detection, tracking, recognition, identification, verification, pair matching, reconstruction, etc. There are a lot of common steps for the huge variety of algorithms being proposed. At the core of this challenge is the extreme diversity of viewpoint, lighting, clutter, occlusion, environment, etc. With the rapid technological developments, faces appear also not only in static images but also in videos, making the problem even more challenging. Head pose tracking [ybanez-cvpr-2007] is an important issue in the tracking community and has received an important attention in the last decade because the multiple applications involved [murphy-pami-2008]. These applications include video surveillance, human–computer interface, biometrics, etc. However, face tracking is still a major challenge due, in part, to nonrigid motion, appearance variations, illumination changes, and occlusions. For tracking objects in video sequences, observations take place sequentially, and predictions are determined as soon as image frames arrive. Recently, sequential Monte Carlo (SMC) techniques [doucet-springer-2001] have received considerable attention because of their ability to escape from local minima and their applicability to non-Gaussian data. A condensation algorithm, also known as bootstrap filter, is an example of an SMC method. On the other hand from the pioneering work of Cootes and Edwards [cootes-pami-2001], active appearance models (AAMs) for deformable objects provide an estimation of the parameters that adjust the model to match closely the current video frame. The evaluation function to minimize, using typically a Gauss– Newton method, may be viewed as a confidence level. For the above mentioned applications we are also facing the usual problems of labeled databases. They are needed to train various models, such as active appearance models (AAMs). Labeled databases are also needed in order to evaluate the real contributions of the multitude of newly proposed algorithms, including large scale machine learning methods, such as [sonnenburg-jmlr- 2006]. The reported evaluation results often seem to outperform the existing “baseline” systems and are in the majority of the cases evaluated on single databases. It is difficult to say if their good performance is also due to some biases related to the evaluation database. Eventhough, publicly available databases are becoming bigger and bigger, new datasets are still needed. Pinto et al. [pinto-cvpr-2009] propose to use in addition synthetic evaluation databases. Recent results reported by Zhou et al. [zhou-icpr-2010], show that the face analysis and synthesis framework initially proposed by Blanz and Vetter [blanz-siggraph-1999] is also useful to generate synthetic 3D faces with automatically generated landmarks that can be exploited to train a landmark detection algorithm. Usually for the learning phase manually annotated landmarks are used. Therefore this analysis and synthesis framework with a 3D Morphable Model (3DMM) is a good candidate to generate synthetic data for learning methods (including head pose estimation methods) and also for evaluation purposes. In the first part of the study, the student will start with commonly used Bayesian approaches that consist of combining a dynamical moving model (prediction) and a « likelihood » measure (observation) performed in the novel image (incoming frame) using an appearance model. A part of this study will be in relation with the ANR project ORIGAMI where we are concerned by tracking the customer’s gaze in a supermarket, more precisely by the head pose because, in the considered situation, we have no access to the iris location. In this application we are facing the usual problems of availability of relevant labelled databases. We are going to exploit results obtained in [zhou-icpr-2010] in order to synthesize useful database for training and testing. The head pose estimation and tracking will be furthermore integrated in newly proposed robust face processing algorithms capable of functioning in unconstrained, real-world environments, and exploiting the temporal information that can be found in video sequences.

Doctorant.e: Tran Ngoc Trung