Description
Date depot: 1 janvier 1900
Titre: Manifold-based representations of musical signals and construction of generative spaces
Directeur de thèse:
Gérard ASSAYAG (STMS)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
The aim of this research project is to explore learning algorithms that could provide a relevant representation of music by relying on the concept of manifolds, in order to make significant progresses in automatically modeling, extracting and generating music structure from sound signals and symbolic sequences.
A manifold can be seen as a mathematical representation of the more general idea of a structure. It can be defined as a smooth sub-space embedded in a higher-dimensional space that represents a continuous set of properties of its included points.
Several machine learning algorithms rely on manifold-based representations to process complex data. Some of these techniques, as those already used for image processing, allow the extraction of a non-linear structure embedded in a higher-dimensional feature space. Unfolding this manifold makes it convenient for analysis, information retrieval and representation. More recent techniques like deep learning are based on the implicit extraction of a latent space, where the relevant information is supposed to follow a lower-dimensional distribution.
Two major learning paradigms have been explored in the past years, supervised learning and unsupervised learning. Supervised learning requires that this structure is specified from the outside, and so does not seem to fit well with objectives given in this project.
Unsupervised learning, on the contrary, provides a more adequate framework as it does not impose any prior assumption on the dataset.
Furthermore, a wide part of the unsupervised learning algorithms are based on generative architectures, which is of prime importance as we aim for a system able to synthesize directly from the structure it has learned.
The properties of these representations seem to be an accurate meeting point between the musical and computational worlds. Indeed, in the sound realm, this representation has the advantage of providing a structured [1] organization of the data, while allowing the transformation and manipulation of this manifold directly in the higher-dimensional space where it is embedded (an example could be the work of Tymoczko who uses these properties to model symbolic musical patterns [2]).
Another advantage of a manifold-based representation of music is that it could be easily linked with interaction, by relying on gesture information in order to navigate through the manifold structure. However, in order to understand complex creative distributions, we will first need to provide representations based on the underlying nature of musical data. One of the most prominent caracteristic in musical signals is its inherent multi-dimensional nature, both at a spectral and temporal level. Hence, multiple layers of information can be witnessed from the microscopic (spectral relationships of timbre and dynamics) up to the macroscopic (notes and sequences) level. Therefore, in order to really understand musical audio signals, we must face multi-scaled structures in the signal. By carefully handling these properties, we might obtain an accurate representation able to ease analysis, processing, generativity and generalization of the learnt information.
Doctorant.e: Chemla--Romeu-Santos Axel