Description
Date depot: 17 octobre 2024
Titre: High-dimensioanl vector analytics for decision intelligence in data-intensive environments
Directeur de thèse:
Themis PALPANAS (LIPADE)
Encadrant :
George TZAGKARAKIS (FORTH Institute of Computer Science)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Données et connaissances
Resumé: Dynamic decision-making in large-scale uncertain systems poses significant challenges in modern data-rich operational
environments. Traditional decision support systems struggle to manage vast and complex datasets in real time, while handling
high-dimensional problems. In this work, we will develop data-driven large-scale high-dimensional vector analytics tools that can be used for decision optimization. The focus will be on similarity search for high-dimensional vectors, an operation that lies at the heart of several downstream analytics tasks, and is crucial for analyzing extensive vector collections. We will develop techniques for fast similarity search for the Dynamic Time Warping (DTW) distance. Due to its reliability, it is a widely employed distance for similarity searches in data series. However, DTW's computational requirements scale quadratically with the length of the series, presenting substantial challenges for large-scale datasets. To address this limitation, we will explore methodologies where each vector is embedded into a latent space. In this space, the embeddings are designed to preserve the DTW relationships between series, allowing for the use of Euclidean distance, thus significantly reducing computational costs to linear levels. The proposed work will involve new approaches for vector embedding, with the corresponding models, architectures and loss functions, as well as suitable sampling techniques for efficient training. This approach aims to optimize efficiency while maintaining the accuracy of similarity searches.
Résumé dans une autre langue: Dynamic decision-making in large-scale uncertain systems poses significant challenges in modern data-rich operational
environments. Traditional decision support systems struggle to manage vast and complex datasets in real time, while handling
high-dimensional problems. In this work, we will develop data-driven large-scale high-dimensional vector analytics tools that can be used for decision optimization. The focus will be on similarity search for high-dimensional vectors, an operation that lies at the heart of several downstream analytics tasks, and is crucial for analyzing extensive vector collections. We will develop techniques for fast similarity search for the Dynamic Time Warping (DTW) distance. Due to its reliability, it is a widely employed distance for similarity searches in data series. However, DTW's computational requirements scale quadratically with the length of the series, presenting substantial challenges for large-scale datasets. To address this limitation, we will explore methodologies where each vector is embedded into a latent space. In this space, the embeddings are designed to preserve the DTW relationships between series, allowing for the use of Euclidean distance, thus significantly reducing computational costs to linear levels. The proposed work will involve new approaches for vector embedding, with the corresponding models, architectures and loss functions, as well as suitable sampling techniques for efficient training. This approach aims to optimize efficiency while maintaining the accuracy of similarity searches.
Doctorant.e: Panourgias Christos