Description
Date depot: 18 juillet 2022
Titre: Adaptive representation learning for the gestural control of deep audio generative models
Directeur de thèse:
Frédéric BEVILACQUA (STMS)
Directeur de thèse:
Philippe Joseph Rene ESLING (STMS)
Directeur de thèse:
Geoffroy PEETERS (LTCI (EDMH))
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle
Resumé: n recent years, significant advances have been made in deep learning models for audio generation. Deep
generative models have achieved impressive results in generating high-quality audio samples that reproduce
the properties of a given training set distribution [4]. However, controlling such models remains a arduous
and daunting task, which might hamper their creative use by both expert and non-expert users. Existing con-
trol methods mainly rely on massive sets of paired examples with labelled attributes. However, in real-life
control problems, these labelled data are often scarce, task-oriented and subject to human biases. Musical
applications especially rely on very limited datasets that match closely a given artistic intention. Hence, deep
generative models lack intuitive personalized control methods that could leverage the non-linear topology
of latent representations, while promoting co-creative human-machine interaction.
This PhD aims to overcome these limitations and propose novel methods to provide dynamic user-adapted
control spaces on deep audio generative models. To do so, we propose to rely on a joint approach based on
implicit regularization of the latent space, and explicit transformation of a subset of dimensions to provide
user-centered control spaces. We aim to target adaptive representation learning by relying on self-supervised
learning between both the audio and control latent spaces, by learning joint priors between these spaces. The
project will be co-supervised by three experts covering the fields of deep audio generative models, human-
machine interaction and self-supervised few-shot learning. We expect high impact in the fields of creative
industries and music practitioners (artists, educators and amateurs).
Doctorant.e: Nabi Sarah