Description
Date depot: 5 avril 2024
Titre: Deep Learning for Vector Similarity Search
Directeur de thèse:
Themis PALPANAS (LIPADE)
Encadrante :
Ioana ILEANA (LIPADE)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Données et connaissances
Resumé: Massive collections of high-dimensional vectors are becoming a reality for virtually every scientific and social domain, and have to be processed and analysed, in order to extract useful knowledge. The high dimensionality of these data objects makes their management and analysis a hard problem. Despite the recent advances in the field, the state-of-the-art solutions seem to have reached their limits, failing to deliver the performance levels (in terms of scalability, accuracy and versatility) required for a large class of important scientific and industrial applications. In this project, we propose the design of vector management methods that (in contrast to traditional approaches) employ models that learn from and adapt to data characteristics and query workloads, in order to deliver considerable improvements in terms of performance. These methods are novel in the context of vector analytics, necessary in order to make further performance advancements in the field, and very challenging due the high-dimensional (i.e., thousands) and large volume (i.e., TBs-PBs) nature of the problem. The proposed methods will benefit the multitude of applications that need to analyze massive vector collections (such as in astrophysics, manufacturing, neuroscience, etc.), including deep neural network embeddings of various data objects (such as medical images, traffic monitoring videos, social graphs, molecule structures, and others).