Projet de recherche doctoral numero :8450

Description

Date depot: 22 février 2023
Titre: Efficient deep generative models through topology-merging subnetworks for embedded audio synthesis
Directeur de thèse: Philippe Joseph Rene ESLING (STMS)
Directeur de thèse: Philippe CODOGNET (Univ. de Tokyo)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle

Resumé: Deep learning models have provided extremely successful methods in most application fields, obtaining unprecedented accuracy in various tasks. However, the consistently overlooked downside of deep models is their massive complexity and tremendous computation cost. Besides hampering the possibility for model interpretability, the energy and computational costs of such architectures are raising crucial issues of environmental sustainability. This aspect is especially critical in audio applications, which heavily relies on specialized embedded hardware with real-time constraints. Although deep generative models are now able to synthesize waveform data with unprecedented quality, they still require costly specialized hardware and large inference times. Hence, the lack of work on efficient lightweight deep models is a significant limitation for the real-life use of deep models on resource-constrained hardware. The goal of this PhD is to explore the recently-defined lottery ticket hypothesis, in the case of generating highly variable and complex audio data with multiple modes. This hypothesis states that randomly-initialized neural networks already contain extremely sparse subnetworks that could have higher accuracy than their larger counterparts if they were trained in isolation. Hence, finding these subnetworks implies that the same task could be solved in a lightweight, memory and energy-efficient way. However, most of the researches in this direction remain applied on large and homogeneous datasets. The goal of this PhD is to extend and analyze these methods in case of high dataset variability with low amount of available data (low-shot) per mode. To do so, we will explore the possibility to obtain different subnetworks for specific data modes, while avoiding the need for repeated training cycles. We will then study the mode connectivity between both the obtained representation spaces and the weights of resulting sub-networks with tools from the information bottleneck theory. Based on this, we will explore the notion of topology-merging subnetworks, where efficient light networks are trained jointly for different modes, while trying to merge their respective computation. This PhD will lead to the development of innovative musical instruments, providing users with diverse creative control over deep generative audio models on constrained hardware and embedded audio synthesizers.



Doctorant.e: Genova David