Description
Date depot: 10 octobre 2022
Titre: Methods for big-data neuroimaging analyses
Directeur de thèse:
Olivier COLLIOT (ICM)
Encadrant :
Baptiste COUVY-DUCHESNE (ICM)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Sciences de l’information et sciences du vivant
Resumé: The field of neuroimaging is at a turning point, owing to the availability of several large datasets such as the UKBiobank, which comprises more than 40,000 volunteers from the general population with deep phenotyping, multimodal MRI and genotyping data. In comparison, clinical samples currently comprise a few thousands of individuals at most, though larger samples should be available soon. For example, the ARAMIS team is working at analysing the AP-HP (Assistance Publique des Hopitaux Parisiens) data comprising tens of thousands of individuals with diagnosis information and brain MRI. Such large data promise a finer understanding of the brain association with disorders as well as improved risk prediction, though they also raise computational and methodological challenges.
In particular, performing association and building prediction algorithms able to deal with the large number of features that still far exceeds the number of participants. We require efficient algorithms and models that can scale up to the data (UKB sample is expected to grow to 100,000 participants) and that can combine information from different samples [Couvy-Duchesne et al., 2021]. Current approaches in neuroimaging, include univariate association analyses, which we showed to exhibit a high false positive rate due to unaccounted data structure [Couvy-Duchesne et al., 2022]. In addition, prediction often relies on penalised regression or convolutional neural networks (CNNs) that become extremely costly to train or update on large sample sizes, and often require to pull raw data together from the different studies. A specific limitation of the current population samples, is that the number of cases is often too limited to study clinical phenotypes, which prevents from improving disease risk prediction based upon smaller clinical samples. Finally, we lack methods for understanding and predicting disease progression from cross-sectional data, until large longitudinal datasets become available. In particular, we would like to identify early markers of the disease, which could serve for risk prediction, prevention and early intervention. More generally, understanding the cascade of brain changes occurring in neurological or psychiatric disorders can clarify the aetiology of the disorder, but also progress our prediction of individual’s risk and trajectories.
The PhD student will develop or adapt statistical methods, to tackle the challenges of big-data neuroimaging described above.
Each method/approach will be extensively tested and evaluated on simulated and real biobank data (open-access), already available and maintained in the ARAMIS project team.The methods will be implemented in open source software and packages. The PhD project will also imply extensive data management and some MRI image processing.
Doctorant.e: Delzant Elise