Projet de recherche doctoral numero :8552

Description

Date depot: 6 juillet 2023
Titre: Deciphering the complexity of proteoform interactions with evolutionary- and physically-informed protein language models
Directrice de thèse: Elodie LAINE (LCQB)
Encadrant : Sergei GRUDININ (LJK)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Sciences de l’information et sciences du vivant

Resumé: The recent advances in high-throughput sequencing, imaging and proteomics have revealed an incredible complexity behind the classical protein sequence-structure-function paradigm. The vision of one gene coding for one protein folding into a unique 3D structure to perform a specific function is largely oversimplified. In particular in multicellular organisms, alternative splicing (AS) can produce several protein isoforms, or proteoforms, from the same gene. These proteoforms may in turn adopt different 3D shapes and interact with different cellular partners. In humans, there are about 100,000 known protein coding transcripts coming from 20,000 genes. The combinatorial space of possible interactions between them is huge. Hence, AS potential for expanding proteome diversity and rewiring interaction networks is fascinating. The goal of this doctoral project is to develop a deep learning architecture inspired by natural language processing with predictive power for protein interaction specificity determinants. The candidate will expand on state-of-the-art protein language models and will adapt such architectures to deal with hierarchical, alternative-splicing aware, and evolutionary meaningful graph-based data structures encoding protein isoform sequences. They will explore the possibility of informing the learnt representations with physical priors (e.g. through 3D structural models). The expected outcomes are an interpretable AI system for recapitulating the protein diversity observed in nature, for generating new diversity beyond what is observed, and for identifying sequence- and structure-based patterns underlying protein interaction strength and specificity in vivo.



Doctorant.e: Nguyen Van Julien