Date depot: 11 janvier 2023 Titre: Synergizing data-driven models and high-throughput experiments to understand protein function and evolution Directeur de thèse: Martin WEIGT (LCQB) Directeur de thèse: Francesco ZAMPONI (LPENS) Domaine scientifique: Sciences et technologies de l'information et de la communication Thématique CNRS : Sciences de l’information et sciences du vivant Resumé: Unveiling the sequence-structure-function relationship of proteins and understanding quantitatively its consequences in protein evolution belong to the most fundamental questions in biology. Over the last years, the accumulation of massive sequence data thanks to next-generation sequencing together with advances in data-driven modeling using statistical physics, machine learning and artificial intelligence, have started to open radically new avenues revolutionizing the field. The present project, which is mostly theoretical / computational in nature but relies on close experimental collaborations, aims at advancing our knowledge along the following lines: 1) We aim at characterizing the fine organization of protein families in sequence space to unveil sequence motifs underlying functional specificity. 2) We aim at building quantitative models of protein evolution to understand how novel protein function emerges. 3) We aim at designing methods for a coevolution-aware ancestral reconstruction overcoming the current limits of ancestral reconstruction algorithms. The collaboration with experiments will grant access to state-of-the-art data and allow for testing predictions.