Leonardo Digital Reviews

Fractal Speech Processing

by Marwan Al-Akaidi
Cambridge University Press, Cambridge, UK, 2004
214 pp., illus. 63 b/w, Trade, $110.00
ISBN: 0-521-81458-8.

Reviewed by Stefaan Van Ryssen
Hogeschool Gent
Jan Delvinlaan 115, 9000 Gent, Belgium

stefaan.vanryssen@pandora.be

The field of digital Speech Processing has different branches: text-to-speech or speech synthesis, speech recognition, and the identification of speakers are the main ones. Each discipline has seen spectacular advances in the past decades, from the earliest synthetic voices of the 60s to very advanced real time dictation programs. The industry has been booming with companies rising and disappearing at a rate similar if not as dramatic as the internet bubble. Fiascos like the Belgian Lernout and Hauspie fraudulent failure have reached the newspaper headlines, all but shielding many successful stories from the public eye. Even if media attention has now somewhat abated, it is easy to forget that the underlying mathematical theory of speech processing has advanced at a steady pace.

Marwan Al-Akaidi, professor at de Monfort University, UK, senior member of the Institute for Electrical and Electronic Engineering and Chairman of the IEEE UKRI Signal Processing Society, is certainly the right person to introduce the newly developing technique of fractal analysis in digital speech processing. Although fractal techniques have been widely used in image processing, the application of fractals in speech processing is relatively new. This book represents the fruit of research carried out at De Monfort University.

The first half of the book (chapters one-three) covers traditional techniques like the fast Fourier transform, digital filtering, and estimation algorithms. Written for engineers and academics, the pace is quite quick, with a focus on computational methods rather than applications and practical results. These chapters can easily be skipped by readers who are well acquainted with the field, since they only summarize established knowledge. Chapters four and five give a quick overview of the history of fractals and the fundamentals of fractal analysis, connecting the concepts of wave form, Fourier transform, and fractal dimension. This is where the book really starts.

In chapter six, 'Speech processing with fractals', the basic techniques for the use of fractals in speech processing——here mainly recognition——are covered, while chapter seven is about speech synthesis. It is unclear from this book what the advantages for speech synthesis with the help of fractal techniques actually are. The field appears somewhat stalled, with Al-Akaidi discussing the use of syllables, demisyllables (initial and final parts of syllables), phones and diphones (basically transitional elements connecting two more or less 'stable' sounds like the middle of a vowel) but not pointing to any progress. Finally, in chapter eight, Al-Akaidi discusses some possible applications of fractal signal processing in cryptology and chaos theory.

All in all, this is a valuable book of reference for engineers and academics with the intention of contributing to applied research. The range of examples and applications Al-Akaidi points at is impressive and may be a source of inspiration for future developments in this adolescent technology.