![]() |
Example of a speech signal which transitions from an unvoiced to a voiced sound |
There are many types of speech coders, but the one used in the DMR (Digital Mobile Radio), D-STAR and System Fusion systems is a commercial speech coder called AMBE. This vocoder (voice coder) can reduce the bit rate of a speech signal to somewhere between 2 and 9.6 kbps. Amateur radio systems operate with bit rates in the lower end. This means that the speech signal is coded so that it only constitutes 3-4% of the data amount that the uncoded signal would have needed. To achieve this, one needs to exploit properties of how the speech signal is generated in the interaction between the vocal cords and the oral cavity.
The first figure shows a section of a speech signal that involves the transition from an unvoiced sound, "s", to a voiced sound, "e" (as in "send"). The unvoiced sound may look like noise and in this case it is generated by friction as air flows out between the teeth. The voiced part has a characteristic repetition period of about 11 ms. This is the pitch with a frequency of the inverse of the period, i.e. about 90 Hz. Its source is oscillation in the vocal cords, as one feels when touching the Adam's apple during a voiced sound. The rapid oscillation between the pitch excitation pulses, is due to resonances in the oral cavity.
A block diagram for how speech is generated in a speech decoder is shown in the next figure. The filter, which is an Infinite Impulse Response (IIR) filter, usually of order 10, is there to recreate the resonances in the oral cavity.The most important information in the voice usually has the vocal cord pulses as its source. At low bit rates, the part of the excitation that encodes this is therefore prioritized higher than the more random components that are also part of the sound image. This easily results in a buzzing, metallic sound in digital speech, where individual characteristics of the voices of the individual speakers tend to be lost.
AMBE allows for variable bit rates, and therefore the speech quality may vary in the systems in use, depending on channel quality etc. As an example, DMR transmits voice at a bitrate of 2.45 kbps. This is very low compared to typical mobile phone bitrates which can be in the range of 6.5 to 12 kbps, and this explains much of the degradation in sound quality.
The principle of analyzing the voice signal to find the parameters is much more complicated than the recovery or synthesis shown here and is therefore not included.
This has been a brief description of the principles behind low bitrate speech coding. It was first written for the chapter on digital signal processing in the revised textbook for Norwegian radio amateurs due later this year.
No comments:
Post a Comment