Description
Date depot: 1 janvier 1900
Titre: Deep learning and kernel networks for visual scene recognition
Directeur de thèse:
Hichem SAHBI (LIP6)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
The progress in machine learning and artificial intelligence is nowadays largely dominated by the successful resurgence of deep neural networks and the current hardware resources that make learning those models on bigdata much more tractable. These models are also theoretically well grounded and rely on highly efficient and effective optimization algorithms. All these technical developments coupled with the era of bigdata, have lead to major breakthroughs and success stories in pattern recognition and neighboring fields, such as visual/speech recognition, machine translation, etc., where artificial systems currently match (and sometimes beat) human performances (see for instance [3]). In the particular context of visual recognition, there is an exponential growth of image and video collections in the web (and social networks) and this makes their manual annotation and search completely out of reach. With this growth rate, there is an urgent need for reliable automatic solutions able to annotate and search these large collections. Image and video annotation is a major challenge in computer vision which consists in assigning (and possibly localizing) concepts into flows of visual contents using variety of machine learning and inference techniques (see for instance [1-7]).
Among the machine learning techniques, those based on deep learning (DL) are particularly successful [11]. These DL methods usually operate on vectorial data as the underlying parametric models take vectorial raw data as inputs and return discriminatively-trained representations. However, some interesting problems (such as video analysis in social networks) require handling non-vectorial data such as graphs where (sometimes) only the relationships between data are available. With graphs, existing deep representation learning methods usually proceed in two steps: first, they consider a preliminary process that vectorizes graphs of input data, and then use Siamese-type networks (eg. [9,10]) in order to learn and extract deep representations/similarities on these vectorized data. The loss of structural information – that may result from graph vectorization and its impact on the classification performances – requires developing alternative deep representation learning solutions on graphs that bypass the preliminary step of vectorization.
The major goal of this thesis subject is to achieve deep learning on semi-structured data (mainly graphs); in this topic, we pay a particular attention to kernel-based deep representations [12,13,14]. Kernels are positive semi-definite similarity functions which can be written as dot products in high dimensional Hilbert spaces using implicit kernel representations [8]. In spite of being implicit, these representations are suitable in order to achieve deep learning directly in the Hilbert spaces. The objective here is to extend usual graph kernels (such as graphlet-based and random walk kernels [15], etc.) – that produce relatively shallow representations – in order to make these kernels (and hence their representations) deeper. A deep kernel is defined as a recursive multi-layered combination of standard kernels which capture simple – linear – as well as intricate – nonlinear – relationships. Learning the parameters of a deep kernel network together with classifiers makes it possible to extend deep learning on semi-structured (non-vectorial) data such as graphs.
Doctorant.e: Mazari Ahmed