Projet de recherche doctoral numero :8363


Date depot: 18 juillet 2022
Titre: Adaptive representation learning for the gestural control of deep audio generative models
Directeur de thèse: Frédéric BEVILACQUA (STMS)
Directeur de thèse: Philippe Joseph Rene ESLING (STMS)
Directeur de thèse: Geoffroy PEETERS (LTCI (EDMH))
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle

Resumé: n recent years, significant advances have been made in deep learning models for audio generation. Deep generative models have achieved impressive results in generating high-quality audio samples that reproduce the properties of a given training set distribution [4]. However, controlling such models remains a arduous and daunting task, which might hamper their creative use by both expert and non-expert users. Existing con- trol methods mainly rely on massive sets of paired examples with labelled attributes. However, in real-life control problems, these labelled data are often scarce, task-oriented and subject to human biases. Musical applications especially rely on very limited datasets that match closely a given artistic intention. Hence, deep generative models lack intuitive personalized control methods that could leverage the non-linear topology of latent representations, while promoting co-creative human-machine interaction. This PhD aims to overcome these limitations and propose novel methods to provide dynamic user-adapted control spaces on deep audio generative models. To do so, we propose to rely on a joint approach based on implicit regularization of the latent space, and explicit transformation of a subset of dimensions to provide user-centered control spaces. We aim to target adaptive representation learning by relying on self-supervised learning between both the audio and control latent spaces, by learning joint priors between these spaces. The project will be co-supervised by three experts covering the fields of deep audio generative models, human- machine interaction and self-supervised few-shot learning. We expect high impact in the fields of creative industries and music practitioners (artists, educators and amateurs).

Doctorant.e: Nabi Sarah