Description
Date depot: 12 avril 2023
Titre: Deep Learning for Scalable Data Series Analytics
Directeur de thèse:
Themis PALPANAS (LIPADE)
Encadrante :
Ioana ILEANA (LIPADE)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Données et connaissances
Resumé: Massive data sequence collections are becoming a reality for virtually every scientific and social domain, and have to be processed and analysed, in order to extract useful knowledge. A key observation is that sequences have to be processed and analysed as a single object (rather than individual points), which is what makes the management and analysis of data sequences a hard problem. Despite the recent advances in the field, the state-of-the-art solutions seem to have reached their limits, failing to deliver the performance levels (in terms of scalability, accuracy and versatility) required for a large class of important scientific and industrial applications. In this project, we propose the design of a new generation of sequence management and complex analytics methods that (in contrast to traditional approaches) employ models that learn from and adapt to data characteristics and query workloads, and that (in contrast to traditional approaches) support analytics with multivariate and varying-length sequences, in order to deliver up to orders of magnitude of improvement in terms of performance. These methods are novel in the context of sequence analytics, necessary in order to make further performance advancements in the field, and very challenging due the high-dimensional (i.e., thousands) and large volume (i.e., TBs-PBs) nature of the problem. The proposed methods will benefit the multitude of applications that need to analyze massive sequence collections (such as in astrophysics, manufacturing, neuroscience, etc.). Moreover, the proposed methods will also benefit an emerging large class of applications that analyze collections of high-dimensional data, such as the deep neural network embeddings of various data objects (such as medical images, traffic monitoring videos, social graphs, molecule structures, and others).