Projet de recherche doctoral numero :5541


Date depot: 16 décembre 2018
Titre: Advanced statistical modeling of biological sequences
Directeur de thèse: Martin WEIGT (LCQB)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini

Resumé: Biology is undergoing a deep transformation toward a data-rich science. Approaches based on statistical modeling and learning are key to use such data to make sense of raw data coming out of numerous high-throughput experiments, to extract information hidden in such data, and to increase our understanding of complex biological processes.Within this doctoral project, we will explore modern statistical and machine learning approaches, including in particular restricted Boltzmann machines and adversarial networks, for an unsupervised description of families of proteins, i.e. proteins coming from a common ancestor in evolution, which have conserved important structural and functional characteristics based on strongly diverged sequences. Our aim is to construct parsimonous, interpretable and generative models. Interpretability is important to improve our understanding of the slective forces underlying natural evolution of biological molecules. The generative character of these models is important in the context of the development of novel, data-driven approaches to de novo protein design. 

Doctorant.e: Shimagaki Kai