Projet de recherche doctoral numero :2743


Date depot: 1 janvier 1900
Titre: Modelling and transformation of sound textures and environmental sounds
Directeur de thèse: Xavier RODET (STMS)
Directeur de thèse: Axel ROEBEL (STMS)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini

Resumé: Introduction The majority of research dealing with the representation and transformation of sound has been devoted to music and speech sounds. Accordingly, for these sounds, very high quality sound transformation can be achieved using notably some sort of sinusoidal modeling techniques or phase vocoder based representations [Laroche:99, Roebel:03]. Recently the need for synthesis, modeling and transformation of other sound classes has emerged notably for the rendering of virtual reality scenes for example for video games. These sound classes cover everyday sounds emerging from a natural environment: footsteps, moving cars, rain, wind, brushing of clothes, fire, etc. These sounds, while containing a high degree of noise, nevertheless possess a characteristic structure that compared to the musical and speech signal models, requires extended modeling strategies. While the research activity has been comparatively small there nevertheless exist a number of potentially interesting approaches [Strobl:06, Dubnov:02, Athineos:03, Zhu:04, Verron:10, Lagrange:10]. Research topics The central topic of this phd thesis is the representation and transformation of environmental sounds. The objective of the research is to device signal representations that allow analyzing given sound signals using parameterizations that allow intuitive signal transformations. Example transformations are: increasing sound duration without changing perceived characteristics, changing physical parameters of the sound sources e.g. its perceived size without changing the duration (size of a person making footsteps, density of rain), or even changing materials of the sound sources (rain falling on different materials). The parameterization presented to the user should be intuitive abstracting as much as possible from the internal representation of the sound. As baseline system for comparison the phase vocoder Given that the phase vocoder algorithm existing at IRCAM the extended phase vocoder implemented at IRCAM should be used. Improvement of the baseline system related to the modification of noise sounds should be investigated. Research plan The thesis will start with an extensive overview of the state of the art for the representation of sound textures and environmental sounds that should clarify the properties and potential of the existing methods. In this phase the research is based on theoretical investigation, however, whenever necessary and possible without extensive efforts the representation properties should be clarified by means of experimental investigation. The investigation should include standard tools like the phase vocoder. The result of this first phase is a basic understanding of the sound signal properties that are required to successfully apply a given signal model for the transformation of the sound. As first step to improve existing methods the problem of time stretching of noise within the phase vocoder should be investigated. As a next step and based on the introductory investigation potentially interesting signal representation strategies will be implemented and signal transformation strategies will be devised. The most interesting algorithms will be implemented and the related signal transformation strategies will be evaluated experimentally. References [Laroche:99] J. Laroche and M. Dolson, « Improved Phase Vocoder Time-Scale Modification of Audio », IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 323-332 1999. [Roebel:03] A. Roebel. «A new approach to transient processing in the phase vocoder », Proc. of the 6th Int. Conf. on Digital Audio Effects (DAFx03), pp. 344-349, 2003. [Strobl:06] G. Strobl, G. Eckel, D. Rocchesso, S. le Grazie.” Sound texture modeling: A survey”, Proceedings of the 2006 Sound and Music Computing (SMC) International Conference, pp. 61-65, 2006. [Dubnov:02] S. Dubnov, Z. Bar-Joseph, R. El-Yaniv, D. Lischinski, M. Werman. ”Synthesizing Sound through Wavelet Tree Learning” IEEE Computer Graphics and Applications, vol. 22, no. 4, pp. 3848, Jul/Aug 2002. [Athineos:03] M. Athineos, D.P.W. Ellis. « Sound texture modeling with linear prediction in both time and frequecny domains », Proc. ICASSP, Vol V, pp. 648-652, 2003. [Zhu:04] X. Zhu, L. Wyse. « Sound texture modeling and time-frequency LPC », Proc DAFx 2004. [Lagrange:10] M. Lagrange, G. Scavone, P. Depalle. « Analysis / Synthesis of Sounds Generated by Sustained Contact between Rigid Objects », IEEE Transactions on Audio Speech and Language Processing, vol. 18, n° 3, 2010. [Verron:10] C. Verron. « Synthèse immersive de sons d’environnement », PhD thesis, l’Université Aix-Marseille I, 2010.

Doctorant.e: Liao Wei-Hsiang