Description
Date depot: 10 décembre 2021
Titre: Multi-FedLS: A Scheduler of Federated Learning Applications in a Multi-Cloud Environment
Directeur de thèse:
Pierre SENS (LIP6)
Encadrante :
Luciana ARANTES (LIP6)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Systèmes et réseaux
Resumé: Federated Learning (FL) is a new area of distributed Machine Learning (ML) that emerged to deal with data privacy concerns. The architecture of an FL application consists of one server and many clients, and each of the clients has access to local and private datasets. The server coordinates the whole execution and keeps a global learning model. Each client trains for several epochs in their local model and sends the updates to be aggregated by the server. This approach is very attractive in many domains as it allows the use of private datasets from multiple institutions to train a model without the need of sharing the data among them.Nowadays, the amount of data needed to train a local ML algorithm is huge and most institutions cannot afford local data centers to store it all. One viable option is the use of cloud storage services, which are offered by cloud providers with different privacy guarantees and data availability. For example, in the Google Cloud provider, the user can store the data in a multi-region scenario, in which the data is available in different regions in the same country, and determine who can access it.Besides the cloud storage services, cloud providers offer different services for executing an application. For example, cloud providers offer the possibility for the users to create VMs with different configurations, having total control on them, in a service generically called Infrastructure-as-a-Service (IaaS). Cloud providers usually offer two main markets to deploy the VMs: the on-demand and the preemptible (or spot) one. The on-demand market gives the guarantee of running the VM from the moment the user requests up to its termination. On the other hand, the preemptible (or spot) market uses the spare capacity of the provider to give the user a huge discount in the execution of the VM, but the VM can be revoked at any time.In this thesis, we propose Multi-FedLS, a Federated Learning scheduler to execute FL applications in a multi-cloud environment with budget and deadline constraints, taking into account the current location of each clients’ dataset. As an FL application is executed among different institutions, we do not have control of where the datasets are placed. One institution can place their dataset in the Amazon Web Services provider and another on Google Cloud provider. Besides, the whole FL execution is based on communication between the server and clients and Multi-FedLS needs to consider the communication delay and costs between different regions of a cloud provider and also between different cloud providers. Multi-FedLS reduces costs by using preemptible (or spot) VMs as much as possible as they present a huge discount compared to on-demand VMs. However, since these preemptible VMs can be revoked at any time, Multi-FedLS needs to use fault-tolerance techniques, such as checkpointing and/or task migration, in order to handle these possible revocations and ensure, thus, the complete execution of applications, respecting their constraints.
Doctorant.e: Correia Brum Rafaela