Projet de recherche doctoral numero :4041

Description

Date depot: 1 janvier 1900
Titre: Topic Extraction and Alignment for Large Scientific Document Collections
Directeur de thèse: Bernd AMANN (LIP6)
Directeur de thèse: Hubert NAACKE (LIP6)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini

Resumé: The applicative goal of this thesis is to build phylomemetic trees representing the temporal evolution of terms or concepts that exist in evolving document archives (Phylomemetic tree are inspired from the notion of phylogenetic tree representing the characteristics and evolution of species and derived from the genes of their members). Phylomemetic tree construction workflows involve specialized tools for the semantic and temporal analysis of evolving text corpora [1]. However, current versions are limited to the analysis of medium-sized document collections (~3*10^5 documents) and vocabularies (about 4*10^3 terms). Furthermore, even within this setting, the processing of the related co-occurence matrix is still time consuming and batch oriented.

Doctorant.e: Li Ke