Audio/Image Compression

(Application of Discrete Fourier Cosine Transforms)

Audio Compression

Audio compression is a technique of retaining relevant audio information in a smaller storage format.  This can be achived a number of ways and broad categories of techniques are addressed as follows:

  1. Loss-less compression - This technique uses repeating patterns in the audio file to reduce the data file.  The simplest and obvious method is to check and see if a stereo file only contains mono data.  If so, both audio channels are identical and can, therefore, be transmitted as one channel rather than two.
  2. Frequency filtering - This technique transforms the time-dependent audio file into the frequency domain by use of a Fast Fourier Transform (FFT).  Non-existent frequencies (loss-less compression) or insignificant frequencies (lossy compression) are removed from the audio data  and stored in a format that takes advantage of the  missing frequencies.  The common MP3 format uses this technique to compress files by a factor of 8 - 10.  MP3 converters have a compression parameter which allows you to increase the compression ratio at the expense of sound quality.  This is inherently a stereo format.  Other formats, such as WMA, has a means of identifying mono audio and, therefore, provided additional compression by transmitting only one channel of compressed data.
  3. Sampling rate and bit resolution - All data formats used by computers are inherently digital.  As a result, there is a limit to the faithful capture and reproduction of sound, which is analog in nature.  Since the average human ear responds to frequencies from 20 - 20 kHz, most audio data is recorded at a sampling frequency of 44.1 kHz.  To capture the dynamic range (loudness) of the sound at any instant of time, the analog single must be converted into a binary number, which corresponds to a sound intensity.  If properly adjusted, the dynamic range is subdivided between 2N levels, where N is the number of bits used to represent the sampled sound.  Audio CD uses 16-bits, which allows for 65,536 differnent levels to represent the sound intensity at an instant of time.  Reducing the sampling rate reduces the size of the data file, but eliminates higher frequencies.  Although this is not desireable in most cases, when dealing with human speech, higher frequencies are not as crucial.  As a result, sampling at 8 - 10 kHz is sufficient.  Reducing the number of bits also allows for smaller audio files.  However, dropping half of your bits can decrease your sampling levels significantly.
Whenever lossy compression techniques are used, information is lost and the quality of reproduced audio is compromised.  The goal of compression technology is to minimize perceived deterioration in the audio signal and to minimize the amount of stored or transmitted information.

Reducing the number of bits
In this exercise an audio file is modified to represent different number of bits.  This leads to a sampling error on the intensity of the sound.  The sound is from 2001: A Space Odyssey.  Click on each of the files to hear the change in the sound quality.


Image Compression

For an example of compression using the JPEG format see page on JPEG Compression.