Date depot: 12 avril 2022 Titre: Frugal Learning with Deep Generative Networks for Visual Scene Recognition Directeur de thèse: Hichem SAHBI (LIP6) Domaine scientifique: Sciences et technologies de l'information et de la communication Thématique CNRS : Intelligence artificielle Resumé: Deep learning is currently witnessing a major success in different computer vision tasks including image and video classification. The purpose of deep learning is to train convolutional or transformer networks that map raw data into suitable representations prior to their classification. However, the success of these networks is highly dependent on the availability of large collections of hand-labeled training data that capture the distribution of the learned categories. In many practical scenarios, mainly those involving streams of data (videos, etc.), large collections covering the inherent variability of the learned categories are neither available nor can be holistically processed. Hence, training deep networks should be achieved as a part of a frugal lifelong process, a.k.a. continual or incremental learning. However, in lifelong learning, tasks involve only parts of data/categories, and this potentially leads to catastrophic forgetting (CF) defined as the inability of a learning model to “memorize” previous tasks when handling new ones. Whereas in most of the learning models (especially shallow ones), CF could be overcome, its handling in deep neural networks is a major challenge and existing solutions can only mitigate its effect. Indeed, CF results from the high non-linearity and entanglement of gradients when achieving back-propagation in deep networks (in contrast to shallow ones). Existing straightforward solutions bypass this effect by storing huge collections of data and replaying the learning process using all these collections; whereas replay is highly effective, it is known to be time and memory demanding and may result into resource saturation even on sophisticated hardware devices. Other solutions, with less time and memory footprint (e.g., regularization) can only mitigate the effect of CF. Another category of methods, based on dynamic networks provides a suitable balance between resource consumption and task memorization. The goal of this thesis subject is to design novel deep continual learning models for visual recognition. One of the main challenges is to design discriminative as well as generative networks that learn visual categories effectively while also attenuating catastrophic forgetting. The proposed solutions will be built upon deep variational auto-encoders (VAE) and generative adversarial networks (GANs) that allow mitigating CF with a reasonable growth in the number of training parameters, and memory footprint. The objectives also include (but not limited to) (i) The design of new GANs/VAEs for image generation, augmentation and replay (ii) The disentanglement and interpretation of different factors in these generative models including semantics, appearances and dynamics of the visual contents (iii) The inclusion of these generative models in a whole framework that achieves continual learning while also handling the challenging issue of catastrophic forgetting (iv) These networks should be designed in order to run, not only on standard GPUs, but also on edge devices including mobile phones and connected objects endowed with low computational and energy resources (v) Applications include continual object detection, image classification and segmentation in still and video sequences.