Description
Date depot: 1 janvier 1900
Titre: Normalisation linguistique pour la segmentation et regroupement en locuteurs et pour la reconnaissance de locuteur
Directeur de thèse:
Nicholas EVANS (Eurecom)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
Many automatic speech processing algorithms make numerous, sometimes unrealistic or restrictive assumptions about the use-case scenario.
Some algorithms require low levels of noise, low reverberation, a particular grammar, or the presence of a single speaker, for example.
This proposal relates to the handling of speech from multiple speakers, a problem referred to as speaker diarization. Otherwise known as the ‘who spoke when?’ problem, speaker diarization involves the detection of speaker turns within an audio document (segmentation) and the grouping together of all same-speaker segments (clustering). Recent work shows how the performance of state-of-the-art algorithms can be degraded by nuisance variation, e.g. linguistic influences. This thesis will develop new normalization approaches to attenuate linguistic influences and improve speaker diarization performance. The work will also consider the potential of linguistic normalization in the field of short-interval speaker recognition where performance can be degraded by the linguistic imbalance between training and testing data. Finally, the work will be conducted alongside Master’s projects to investigate prototype algorithms for smart phone mobile platforms including the convergence of speaker diarization and recognition with source separation and new applications in distributed, collaborative and real-time speech processing.
Doctorant.e: Soldi Giovanni