Projet de recherche doctoral numero :8132


Date depot: 6 avril 2021
Titre: Deep Learning for the reconstruction of protein-protein interactions between paralogs and mutated proteins
Directrice de thèse: Alessandra CARBONE (LCQB)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Sciences de l’information et sciences du vivant

Resumé: Genomics widely uses machine learning to capture dependencies in data and derive new biological hypotheses. By effectively leveraging large data sets, in the last couple of years, deep learning (DL) has started to transform genomics the same way it worked on computer vision and natural language processing. DL models provide a significantly higher accuracy than state-of-the-art methodologies and offer a truly revolutionary opportunity to accelerate and create innovative applications in genomics.Protein-protein interaction (PPI) networks play a key role in biology and medicine in the interpretation of protein functions in cellular processes. Recently, we developed IMPRINT, a DL architecture that can learn biologically significant interaction motifs and identify the binding site of the interaction for the two proteins. Its data augmentation scheme significantly improves the performance of IMPRINT compared to other methods. In this thesis, we wish to apply statistical inference and Deep Learning to protein sequences to scale up in the problem of PPI reconstruction and bring it closer to biological reality by considering mutated proteins. Namely, we want to focus on similar sequences that are either homologous proteins, that is sequences with a common ancestor, or paralogous sequences, that is sequences with a common ancestor that have been obtained by a duplication of a gene within the same genome, or simply protein sequences defined by one or a few mutations. We want to develop end-to-end DL architectures that take (two or more) proteins as input and that allow us to distinguish interaction patterns between proteins of similar sequences and identify differences at the scale of a single mutation. For this, we shall work on data augmentation schema driven by biological information and on the integration of biological information in the architectures.The questions are of interest of the international bioinformatics and genomics community. Indeed, we will couple Deep Neural Networks (DNN) and evolution to achieve what we could not have done for decades. We will make it possible to analyse ab initio the interactions between tens of thousands of proteins and bring individual PPI networks into clinics. We will also contribute to the creation of an environment of DL models for genomics.