Projet de recherche doctoral numero :8579

Description

Date depot: 17 septembre 2023
Titre: Accurate identification of human protein partners with deep learning and coevolution analysis leading to the identification of binding residues
Directrice de thèse: Alessandra CARBONE (LCQB)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Sciences de l’information et sciences du vivant

Resumé: The vast amount of existing and future sequencing data for humans involving protein coding genes, their spliced forms, paralogs active in different tissues, pathogenic and benign mutations in multiple individuals and populations calls for the accurate reconstruction of interaction networks between protein-coding genes involved in multiple molecular activities. The complexity of these networks is exceptionally high for many reasons. Proteins may be intrinsically disordered, promiscuous or highly specific, they may bind their partners strongly or weakly using different interaction modalities, they may use different regions of their surface to bind different partners, their binding may be coupled or subject to conformational changes. Over the last 20 years, the problem of ab initio PPI reconstruction has been studied by several groups, including this team who proposed the first large-scale search by several groups, including this team who proposed the first large-scale docking-based search for protein partner identification. These attempts were applied to proteins with known structural conformations and suffered from the high computational time required by the approach. The limitations imposed by the structural features of proteins, which define the complexity of the PPI problem as described above, strongly support the interest in developing sequence-based computational methods to partner identification. The recent computational leap made with AI and Deep Learning approaches in protein science paves the way for our recent work and this project. We recently considered the latest generation of protein language models that led to the rapid reconstruction of the 3D structure of millions of metagenomic sequences, ESM2, and used them to design a deep learning architecture, SENSE-PPI (Volzhenin, Bittner, Carbone 2023, submitted), that can be used to search for protein partners in very large sets of proteins, overcoming the time limitations of docking approaches. It screens 10,000 proteins against themselves in a matter of hours, enabling the reconstruction of proteomes at genome level. Also, it can be successfully applied to human proteins to search for intrinsically disordered protein partners. SENSE-PPI generalizes in several ways to model and non-model organisms. It has been compared with the most recent PPI reconstruction methods and strikingly outperforms them all, by distinguishing specific functional subnetworks more clearly. SENSE-PPI will be our starting point to advance on the accuracy of the predictions with the design of a novel deep learning network which could accurately address the variety of proteins features in a specific model species, {\em Homo sapiens} in our case. Interactions between proteins, possibly disordered, their spliced forms, and their paralogs active in different tissues need to be distinguished and no deep learning method exist currently for this challenging task. One main aim of this project is to extract from the new DL architecture, information on the interacting residues. A challenging task, addressed before with no success. To address these questions, we wish to combine the last generation of protein language models, ESM2, with extra information on amino-acid co-evolution (i.e. mutational compensatory patterns preserving the function and/or the structure of a protein during the evolution of different species) of conserved residues (Champeimont, ….. Carbone, Sci Rep 6, 26401, 2016; Douam …. Carbone, PLoS Pathog 14, e1006908, 2018), known to be involved in functional interactions, conformational changes and folding, 3D model reconstructions based on AlphaFold2/ ESM2Fold/ RosettaFold. Interest for the international community. This project will help to shed light to a fundamental question in neuroscience: whether and how two neurotransmitters within the same terminal influence each other’s signaling and release. More concretely, it will respond to the computational questions addressed in the ANR grant ALLEGRO “Molecular partners of AcetyLchoLinE – Glutamate cotRansmissiOn in synaptic vesicles of striatal cholinergic interneurons”, CE11 “Characterization of structures and structure-function relations of biological macromolecules”, funded in 2023. See detailed description.



Doctorant.e: Kamali Persia Jana