Description
Date depot: 1 janvier 1900
Titre: Super-wide bandwidth extension
Directeur de thèse:
Nicholas EVANS (Eurecom)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
Background
In order to improve the perceived quality of speech communications systems, so-called wideband standards have been developed in recent years. One example is the adaptive multi-rate wideband (AMR WB) speech codec, a part of the 3rd Generation Partnership Project (3GPP). The current trend involves super-wideband (SWB) speech signals with an acoustic bandwidth in excess of 14kHz (Enhanced Voice Services EVS codec developed by 3GPP, Skype Opus codec).
Both narrowband and wideband infrastructures are likely to co-exist for some time. There is thus a need to ensure the backward compatibility of narrowband technologies with current and future wideband infrastructure. In consequence, artificial means of extending narrowband speech to wideband speech have been investigated over the last 20 years, e.g. [1, 2, 3, 4]. This technology is known as artificial bandwidth extension (BWX).
Most recent approaches to BWX are based on some form of joint-density modelling in which the missing, wideband spectral components are estimated from the available narrowband components, e.g. [1, 3]. These statistical approaches focus on extending the narrowband spectral envelope while avoiding discontinuity artefacts typical of the earlier vector quantisation (VQ) approaches. The work in [3] was among the first to explore a statistical approach based on the modelling of narrow and wideband speech signals using Gaussian mixture models (GMMs), a joint-density estimation procedure and a traditional source-filter speech model. This work was inspired by developments in voice transformation and conversion, e.g. [5, 6]. Most of the recent work has focused on the same source-filter approach with variations in the statistical approach used to extend the narrowband spectral envelope, e.g. via a hidden Markov model (HMM) approach [4].
The work proposed here aims to extend the statistical approaches of the past and to apply the very latest developments in voice conversion to BWX. The first approach should be an extension of the solution in [4] to SWB. The second approach should be a less complex approach based on, for instance, a simple replica of the spectrum as a low-cost approach with potential for rapid integration into Intel products. The third approach for consideration was originally proposed in a study of automatic speaker recognition vulnerabilities to spoofing through voice conversion. The algorithm essentially converts one person’s voice towards that of another, target speaker using two speaker models: one of the source speaker and another of the target speaker. This PhD programme will adapt this technique to BWX and SWB by using models of different bandwidths rather than speakers
Doctorant.e: Bachhav Pramod