ITU-T Recommendation
From Dr. Mashiur Rahman :: ICT expert :: VoIP & Nanotechnology
Contents |
Digital Speech Interpolation
A digital speech interpolation is combined with an adaptive differential PCM (ADPCM), employing a speech detector for detecting speech signals and for discriminating voiced and unvoiced sounds. An adaptive quantization bit assignment to the speech is adopted to cope with any freeze out condition. And further PCM speech signals with 8 KHz sampling are applied to ADPCM after shifted 250Hz sampling are applied to ADPCM after shifted 250Hz down and then converted into 6 KHz sampling frequency, thereby attending a total gain of about 7 without degrading speech quality. [1]
picture
To have a better understanding on digital speech interpolation we will review our knowledge on the basic methodologies of DPCM and ADPCM.
DPCM
Differential pulse code modulation (DPCM) is a procedure of converting an analog into a digital signal in which an analog signal is sampled and then the difference between the actual sample value and its predicted value (predicted value is based on previous sample or samples) is quantized and then encoded forming a digital value. The principle behind differential pulse code modulation is that the source data is likely to be an analogue signal, which is likely to change in amplitude quite gradually; there are unlikely to be any large jumps in amplitude over a short time. Therefore, the signal can be efficiently represented by an initial value and incremental deltas against this value thereafter. Since these differences are likely to be small, fewer bits may be used to encode such a signal, and therefore throughput may be increased. [2]
ADPCM
A specialization of Differential Pulse Code Modulation, Adaptive Pulse Code Modulation (ADPCM) uses predictive techniques to increase the efficiency of information coding. ADPCM is employed in many modern audio and video compression algorithms, including H.323 Video Conferencing, Voice over IP, and the DECT and WDCT digital cordless phone standards. Differential PCM encodes each discrete information symbol by transmitting only the difference between the current signal and its predecessor. For most real-world signals, this can be far more efficient than vanilla Pulse Code Modulation: fewer bits may be used to represent the deltas than the total magnitude. Adaptive DPCM works similarly to DPCM, but uses a pre-defined algorithm on each side to predict the likely value of the next information symbol delta: the difference between this prediction and the actual value is what is transmitted. A highly accurate algorithm, especially with the output symbols Shannon-Fano encoded, will give a highly efficient transmission method. A simple example algorithm would be to presume that the previous delta will be the same as the next one. For an equally simple example input, a triangular wave, the encoding scheme works as follows:
A ^
m |
p 7| - -
l 6 | - # -#
i 5 | - # # #- -# # # -
t 4 | - # # # # #- -# # # #
u 3| -# # # # # # #- -# # # # #
d 2| # # # # # # # # #- # # # # # #
e 1| # # # # # # # # # # #--# # # # # #
0 ----------------------------------->
Time
The sampled input is shown as # symbols, and the predicted next output is shown as - characters. Only after the first and second samples are sent is it possible to predict the next sample value; the first and subsequent predictions is accurate, with the exception of the turning points. A table of the source, delta, prediction, and the error is shown below:
| Source | 1 2 3 4 5 6 5 4 3 2 1 2 3 4 |
| Prev. Out | - 1 2 3 4 5 6 5 4 3 2 1 2 3 |
| Delta | - 1 1 1 1 1 -1 -1 -1 -1 -1 1 1 1 |
| Predicted | - - 3 4 5 6 7 4 3 2 1 0 3 4 |
-------------------------------------------------------
| Error | - - 0 0 0 0 -2 0 0 0 0 2 0 0 |
------------------------------------------------------
It is the last line which is encoded and transmitted. For this simple case, there are only three possible outputs: the most common case is "no difference", meaning the algorithm was correct, and at every peak or trough in the triangle wave, there is a ±2 bit prediction error. This signal can therefore be coded with only three symbols, and ceil(log2(3))1 = 2 bits per source symbol (although it's possible to use alphabet extension to increase efficiency still further). As with differential pulse code modulation, ADPCM may be lossy or lossless, depending on how the predictive error is encoded. CCITT's G.726 voice-compression standard is lossy; for encoding voice or other audio, the difference in output quality is often unnoticeable.
References
- [1] Digital Speech Interpolation System. 1981, Yoharto Yatsujuka.
- [2] http://everything2.com