Recent Research in Scalable Audio Coding

Summary: We have recently performed a human subjective trial in an effort to quantify the magnitude of the perceptual distortion introduced by MPEG 4 bit slice scalable arithmetic coding (BSAC) at low bit rates. Because we were dealing with large scale impairments, we chose to use the Comparison Category Rating (CCR) approach in which pairs of sequences are compared and the subject rates their relative qualities on a numeric scale from -3 (sequence A is much better than sequence B) to +3 (visa versa). For seven input test sequences and 20 human subjects, the results of our trial are shown in Table 1. Here, we compare scalable BSAC with non-scalable MPEG advanced audio coder (AAC) and transform weighted interleaved vector quantization (TVQ) at output rates of 16 and 32 kbits/sec. For scalable BSAC, encoding is performed at 64 kbits/sec. We note from the table that there is no statistically significant penalty incurred by using BSAC at 32 kbits/sec but that there is a very significant performance penalty incurred at 16 kbits/sec (TVQ scored better by 2.2 out of a maximum of 3.0). We also consider a case in which the input is passed through a lowpass filter (~6 kHz cutoff frequency) prior to encoding. We note that this filtering provides a statistically significant improvement in the perceived quality of the reconstructed audio. We are currently using a variety of analysis techniques to analyze, isolate, and quantify the most perceptually annoying errors introduced by the encoding process in an effort to develop better scalable coding algorithms. The preprint of a paper discussing our perceptual testing results and their analysis should be available for downloading in middle of November 2001.



Table 1: Results of human subjective testing.

Comparison

Mean

Std. Deviation

99% Conf. Int.

tvq16 / bsac16

2.2

0.64

± 0.36

filtered / bsac16

1.17

0.92

± 0.52

tvq32 / bsac32

-0.17

0.98

± 0.55

aac32 / bsac32

0

0.71

± 0.4

total32 / bsac32

-0.08

0.85

± 0.4



Sample Sequences: A few of the sample sequences used in our perceptual tests are included here. These can be downloaded by clicking on them. Since they are stored as uncompressed 'wav' files, downloading these sequences over a modem will likely be very slow.



Table 2: Selected audio test sequences.


Audio Sequence

Scalable BSAC

TwinVQ

Filtered

Nonscalable AAC

Pat Benetar, original

benetar_bsac, 16 kb/s

benetar_tvq, 16 kb/s

benetar_f, 16kb/s

N/A

Quartet, original

quar_bsac, 16kb/s

quar_tvq, 16kb/s

quar_f, 16kb/s

N/A

Excaliber, original

exc_bsac, 32kb/s

exc_tvq, 32kb/s

N/A

exc_aac, 32kb/s