Description
Date depot: 27 mars 2025
Titre: Handling the uncertainty of machine learning predictions in mass spectrometry proteomics
Directrice de thèse:
Nataliya SOKOLOVSKA (LCQB)
Directeur de thèse:
Thomas BURGER (CEA)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle
Resumé: The retention time is unique for each component under a specific set of conditions, such as temperature, pressure, and the type of stationary and mobile phases used. The retention time can be used as a characteristic property to identify different components in a mixture. Improv- ing proteomic dataset completion methodologies while handling the associated uncertainty requires fulfilling three sub-goals: the first one is to improve the reliability and interpretability of retention time (RT) predictions that can be delivered by state-of-the-art deep learning tools, which can reach the current state-of-the-art performances. To do so, we will: (i) rely on conformal prediction to provide confidence intervals for RT predictions; (ii) leverage domain adaptation (DA) to transfer the training across various experimental / technological LC settings to avoid the necessity of retraining.
To provide expertise and supervision in machine learning, statistics, and proteomics, the PhD will be co-supervised by N. Sokolovska and T. Burger. The candidate is expected to have a Master 2 in Computer Science, Applied Mathematics / Statistics, or Bioinformatics, or an equivalent engineering degree. A background in statistics, optimization, artificial intelligence or any related field will be appreciated. An ideal candidate will propose, develop, and numerically test the developed methods. It is expected that the candidate provides some theoretical foundations for the developed methods and also implements them in Python, so that the final product could be publicly available for research purposes. Strong interest in the biological applications is highly appreciated.