Description
Date depot: 1 janvier 1900
Titre: Topic Extraction and Alignment for Large Scientific Document Collections
Directeur de thèse:
Bernd AMANN (LIP6)
Directeur de thèse:
Hubert NAACKE (LIP6)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
The applicative goal of this thesis is to build phylomemetic trees
representing the temporal evolution of terms or concepts that exist in evolving document archives
(Phylomemetic tree are inspired from the notion of phylogenetic tree representing the characteristics
and evolution of species and derived from the genes of their members).
Phylomemetic tree construction workflows involve specialized tools for the semantic and temporal
analysis of evolving text corpora [1]. However, current versions are limited to the analysis of medium-sized
document collections (~3*10^5 documents) and vocabularies (about 4*10^3 terms). Furthermore, even
within this setting, the processing of the related co-occurence matrix is still time consuming and batch
oriented.
Doctorant.e: Li Ke