Description
Date depot: 30 octobre 2024
Titre: Vector Similarity Search
Directeur de thèse:
Themis PALPANAS (LIPADE)
Encadrante :
Ioana ILEANA (LIPADE)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Données et connaissances
Resumé: This PhD project delves into the theoretical foundations of vector similarity search, focusing on developing robust and efficient mathematical approaches for identifying similar items within large datasets. The core challenge lies in efficiently measuring and calculating the proximity between vectors in a given metric space. This involves determining the "distance" or "similarity" between vectors using various mathematical metrics, such as Euclidean distance or cosine similarity. The primary objective is to develop methods for accurately calculating these relationships, leading to a deeper understanding of the underlying structure and patterns within the data. This project will explore theoretical mathematical techniques, such as dimensionality reduction (e.g., principal component analysis), to simplify complex datasets while preserving key features. This simplification will enable the development of optimized indexing structures and search algorithms that can locate similar items within these reduced datasets. Through this research, we aim to contribute to a deeper understanding of how to effectively manage and analyze large collections of data, ultimately leading to more sophisticated and efficient theoretical frameworks for similarity search.
Résumé dans une autre langue: This PhD project delves into the theoretical foundations of vector similarity search, focusing on developing robust and efficient mathematical approaches for identifying similar items within large datasets. The core challenge lies in efficiently measuring and calculating the proximity between vectors in a given metric space. This involves determining the "distance" or "similarity" between vectors using various mathematical metrics, such as Euclidean distance or cosine similarity. The primary objective is to develop methods for accurately calculating these relationships, leading to a deeper understanding of the underlying structure and patterns within the data. This project will explore theoretical mathematical techniques, such as dimensionality reduction (e.g., principal component analysis), to simplify complex datasets while preserving key features. This simplification will enable the development of optimized indexing structures and search algorithms that can locate similar items within these reduced datasets. Through this research, we aim to contribute to a deeper understanding of how to effectively manage and analyze large collections of data, ultimately leading to more sophisticated and efficient theoretical frameworks for similarity search.
Doctorant.e: Qi Yanlin