Description
Date depot: 24 novembre 2021
Titre: Entraînement distribué asynchrone, décentralisé, découplé & inhomogène de modèle d’apprentissage
Directeur de thèse:
Edouard OYALLON (ISIR (EDITE))
Domaine scientifique: Sciences pour l'ingénieur
Thématique CNRS : Intelligence artificielle
Resumé: Machine learning has received a plethora of attention but as well brought with it new problems which are extremely large scale and whose size is continuously growing: there is always more data, that requires more computational power and models which require larger hyperparameter grid-search. This is particularly the case of deep learning models, which employ all the ad-hoc engineering tricks which are available and extremely greedy in terms of computational resources. In order to face all those challenges, distributed optimization is an emerging crucial component. The main idea is to split a model of size N on K machines of size N/K, in order to obtain a speed-up linear in O(K). Obviously, employing K machines lead to a compromise in terms of connectivity between those machines, non heterogeneity of the environment and thus the rate of speed-up is often sublinear. Interestingly, this field is at the boundary of optimization, graph theory and Markov Chain, with a broad application range. This PhD project wants to address algorithms which are both decentralized and asynchronous, meaning that there is no master node to help slave nodes to cooperate, and some nodes might potentially have some substantial delay due to a bad connectivity or a heterogeneous hardware.
The objective of this phd is to address the following points:
- Introducing new methods to handle distributed asynchronous decentralized training.
- Studying the robustness of those algorithms from a theoretical perspective.
- Studying the practicality of those algorithms.
This PhD will be funded by the ANR project ADONIS and should be held between Sorbonne University (France) and Concordia University (Canada), it will be co-supervised jointly by Dr. Edouard Oyallon (CR CNRS) and Prof. Eugene Belilovsky (Assistant Professor, Mila).
Résumé dans une autre langue: Machine learning has received a plethora of attention but as well brought with it new problems which are extremely large scale and whose size is continuously growing: there is always more data, that requires more computational power and models which require larger hyperparameter grid-search. This is particularly the case of deep learning models, which employ all the ad-hoc engineering tricks which are available and extremely greedy in terms of computational resources. In order to face all those challenges, distributed optimization is an emerging crucial component. The main idea is to split a model of size N on K machines of size N/K, in order to obtain a speed-up linear in O(K). Obviously, employing K machines lead to a compromise in terms of connectivity between those machines, non heterogeneity of the environment and thus the rate of speed-up is often sublinear. Interestingly, this field is at the boundary of optimization, graph theory and Markov Chain, with a broad application range. This PhD project wants to address algorithms which are both decentralized and asynchronous, meaning that there is no master node to help slave nodes to cooperate, and some nodes might potentially have some substantial delay due to a bad connectivity or a heterogeneous hardware.
The objective of this phd is to address the following points:
- Introducing new methods to handle distributed asynchronous decentralized training.
- Studying the robustness of those algorithms from a theoretical perspective.
- Studying the practicality of those algorithms.
This PhD will be funded by the ANR project ADONIS and should be held between Sorbonne University (France) and Concordia University (Canada), it will be co-supervised jointly by Dr. Edouard Oyallon (CR CNRS) and Prof. Eugene Belilovsky (Assistant Professor, Mila).
Doctorant.e: Nabli Adel