Projet de recherche doctoral numero :8310


Date depot: 11 avril 2022
Titre: Improving few-shot learning through latent space topology of deep audio generative models
Directeur de thèse: Philippe Joseph Rene ESLING (STMS)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle

Resumé: The recent advent of generative models has yielded impressive results in diverse application fields, notably for audio synthesis. However, the major drawback of almost all of these methods is that they require massive datasets of examples and, consequently, extensive training times. This poses problems in cases where the amount of examples is inherently limited or very costly to obtain, but this also incurs energetic and environmental issues. Furthermore, this need for extensive datasets also questions the true generalization abilities of these models. Recently, the field of Few-Shot Learning (FSL) has tried to tackle these issues directly, by searching for models that could generalize from only few examples. However, most approaches are based on pre-training with large datasets or performing expensive augmentation strategies. The goal of this PhD is to develop novel methods able to address few-shot learning in deep audio synthesis, without relying on extensive pre-training. Our main assumption is that the underlying topology of deep latent variable models can provide keys to understand situations where the properties of the datasets (number of examples, modes and their relative densities) are sufficient to provide accurate training. Hence, this PhD will extend on our previous works on deep audio synthesis by infusing geometry-aware methods and training. First, we will analyze how different sub-sampling of the same dataset may impact the topology of latent models and, consequently, their quality. This will also allow to study the questions of mode collapse and correlate it to the relative densities of different modes. Then, we will rely on tools from information geometry and Riemannian topology in order to regularize the learning. Our goal is to study the latent space kinematics in order to improve the generalization of deep generative models trained with only few examples. This PhD can lead to efficient audio models, but also generic approaches to improve few-shot learning and the generalization abilities of generative models.