Description
Date depot: 1 janvier 1900
Titre: Signal and image processing study of a non-vocalized speech interface for medical and telecommunication applications
Directeur de thèse:
Bruce DENBY (Institut Langevin (EDITE))
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
In the past few years there has been significant interest in a new
type of speech recognition system using sensors that can give direct
information on the movement of the articulators (essentially the
tongue and lips), for example, ultrasound imaging or electromyographic
sensors. Such systems can restore the ability to speak for persons who
have lost the use of their vocal chords due to cancer or injury, and
can also enable speech recognition in very noisy environments where
the acoustic signal is corrupted. A third very interesting possibility
is the so-called Silent Speech Interface, which would allow a user to
engage in spoken communication without activating his or her vocal
chords, thus providing a very secure and unobtrusive means of verbal
communication in a variety of everyday situations; cellphone
manufacturers are in particular very interested in this last
possibility.
At the Sigma Lab, ESPCI ParisTech, we have for a few years now been
developing a Silent Speech Interface based on real-time ultrasound
imaging of the tongue and video imaging of the lips during speech, and
we have recently begun to obtain some very interesting results. The
purpose of the the PhD thesis will be to use image and signal processing techniques, including machine learning, to improve system performance, using as a basis a large, recently acquired database of multispeaker silent speech ultrasound and video data. The goal will then be to move towards a high-performance and genuinely practical system, having applications in a wide range of areas both in medicine and in industry.
Doctorant.e: Xu Kele