Description
Date depot: 2 avril 2024
Titre: Integrative generative probabilistic models for protein families
Directeur de thèse:
Martin WEIGT (LCQB)
Directeur de thèse:
Andrea PAGNANI (Politecnico di Torino)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Sciences de l’information et sciences du vivant
Resumé: Protein biology is undergoing a transformative phase due to the proliferation of protein sequence data and the adoption of sophisticated statistical and machine learning methods. Two primary directions have emerged in this field: protein-family specific generative models, focusing on evolutionary related proteins, and general protein language models, covering the entire known protein sequence space. However, both approaches currently treat protein sequences as abstract strings, neglecting the wealth of experimental data characterizing protein structure and function. To address this gap, a proposal for integrative generative probabilistic models is put forward, incorporating sequence, functional, structural, and interaction data. By leveraging various data types and combining computational modeling with experimental validation, this approach aims to elucidate the intricate sequence-structure-function relationship in proteins, driving advancements in protein engineering and design through data-driven generative modeling and fostering interdisciplinary collaborations between computational and biological sciences.
Doctorant.e: Peinetti Giovanni