Description
Date depot: 9 avril 2023
Titre: physics-informed neural audio synthesis
Directeur de thèse:
Axel ROEBEL (STMS)
Encadrant :
Guillaume DORAS (STMS)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle
Resumé: Digital voice or musical instrument synthesis aims to generate realistic audio with controllable pitch, timbre, or expressivity. In the past, several synthesis methods based on physical or signal models have been successfully studied and made available to the general public. More recently, the advent of neural audio synthesis yielded spectacular results. However, despite their quality, these deep neural models remain black box models whose inner synthesis mechanisms cannot be easily interpreted. Designing inherently interpretable audio synthesis systems has the advantage of providing human-friendly parameters that can be learned from data for sound analysis purposes and/or manipulated for intuitive control for sound synthesis purposes. But interpretable models typically require the imposition of some domain knowledge-based structural constraints.
In this thesis, we propose to derive these constraints from the physical characteristics of the human vocal apparatus and various musical instruments, which can all be modelled as passive non-linear oscillating systems. This approach, referred to as “physics-informed neural network” modeling, is motivated by the results obtained in other scientific fields confronted with complex dynamical systems. In particular, the port-Hamiltonian Neural Networks (pHNNs) approach has proven its ability to model complex real-world physical systems that cannot be fully described analytically, and to learn their dynamics from data samples. We hypothesise that coupling strong physics-based priors with neural audio synthesis will 1) reduce the space of admissible solutions and help to generate high-quality sounds with fewer data, and 2) provide interpretable physical parameters that will open new perspectives both for sound analysis and synthesis. The synthesis quality of these physics-informed neural instruments will then be assessed not only via quantitative and qualitative evaluations but also via usability case studies within the IRCAM's artistic community.
Doctorant.e: Linares Maximino