Projet de recherche doctoral numero :8306

Description

Date depot: 11 avril 2022
Titre: Deep Learning for the prediction of the effects of mutations in proteins and protein-protein interactions
Directrice de thèse: Alessandra CARBONE (LCQB)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Sciences de l’information et sciences du vivant

Resumé: Protein-Protein Interactions (PPI) play a key role in biology and medicine in the interpretation of protein functions in cellular processes. The ab initio reconstruction of highly precise PPI networks of individual genomes from human populations (African, Caucasian…) and the ab initio reconstruction of the phenotypic mutational landscape of its proteins constitute a fundamental step. Ab initio networks are independent from experimental and literature knowledge, allowing to identify proteins whose interactions have not been previously observed. This unbiased approach is fundamental in precision medicine where the reconstruction of PPI network topology changes is of crucial importance. Indeed, each network carries information for an estimated average number of ~3000 nonsynonymous genetic variants per individual genome (1000 Genome Project Consortium, 2012). Only some of these variants disrupt protein functions and are likely disease-causing. Identifying those alterations that change many characteristics of a protein (thermodynamic stability, ligand binding and cellular localization) and, consequently, network topology is crucial to understand diseases. The ensemble of PPI networks for a population will be a mine of new information to enhance medical knowledge and generate new questions. Recent deep learning approaches based on sequences (mCSM, iSEE, FoldX, MuPIPR) addressed the question of estimating binding affinity changes in PPIs, performed better than more classical approaches, but also showed that they are far from solving the problem, reaching a correlation of only 0.25 with experimental data measuring the changes of binding affinity over reference experimental datasets (SKEMPI v2). A contribution to these fundamental questions will have clearly an impact in Biology and Medicine. The development of original DL approaches will also have an impact in Computer Science by bringing new kinds of data complexity and computational challenges for their treatment to the world of AI. In this thesis we wish to tackle two related problems concerning the effect of protein mutations. They have a different degree of complexity. First, we wish to develop an end-to-end deep learning framework to estimate the effects of mutations on PPIs based on sequences only. More precisely, we want to estimate protein binding affinity (BA) change and protein buried surface area (BSA) changes upon mutation for pairs of protein sequences known to interact. Second, we wish to construct a second end-to-end deep learning framework to estimate the functional and structural effects of mutations in a protein sequence, taken alone, without knowing its partner. This is an independent problem whose solution will likely benefit from what is learned in the first problem, which is explicitly assuming that protein pairs enter in physical contact. Both problems are stated for protein sequences, constituting the primary information to describe a protein. See file attached for a detailed description.