Projet de recherche doctoral numero :6608


Date depot: 3 décembre 2019
Titre: Hierarchical temporal learning​ for multi-instrument and orchestral audio synthesis
Directeur de thèse: Jean BRESSON (STMS)
Directeur de thèse: Philippe Joseph Rene ESLING (STMS)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini

Resumé: Generative models are machine-learning approaches whose training can address two different optimization tasks. One objective can be to build a latent space, that provides a low-dimensional representation of the data, eventually subject to various regularizations and constraints. This compressed representation aims to disentangle factors of variation in the data. The second is the reconstruction or synthesis of new elements through the sampling of this latent space. These systems are very promising because their space is a high-level, 'over-compressed' representation that can be controlled for several tasks, such as visualization, measurements, interpolation, generation. However, one of the most prevalent problems of ML algorithms applied to musical creativity is that they only process a single temporal scale or at best a finite set of small scales. Hence, one of the objectives of this project is to work on an approach able to build a multi-scale hierarchical representation of audio, so that long time dependencies can be learned as well. A first approach towards this idea is to build a functional ​Variational Autoencoder ​(e.g VQVAE, Nsynth), whose latent space trajectories can be modelled by an auto-regressive model (e.g RNN, Wavenet, WaveRNN). Iteratively building such latent spaces could lead to a model generating audio involving long time dependencies. The second objective of this project is to address the generation of multiple instruments in a common framework. Hence, one could foresee the generation of audio signals conditioned on other signals, in complex music settings such as musical orchestration. The project will also tackle these types of systems in different music styles (ambient, contemporary), in direct collaboration with musical composers. The goal of this PhD will be to both provide a generative system based on raw audio and evaluate its use and control for multivariate musical audio generation. The quality and results of these models will be directly assessed through the creation of new musical pieces.

Doctorant.e: Caillon Antoine