Learn how Audio and Video are encoded on CDs, Blu-rays and Internet

amirm · Sep 29, 2011

I wrote an article for Widescreen Review magazine a couple of years ago, covering the basics of how audio and video are encoded for delivery to consumers. I thought it would be worthwhile to post an updated version of it. Here is the full article: Digital Audio and Video formats on CD, Blu-ray and Internet.

This is the audio portion. Click on the above link to read the rest of it covering video.

Get ready to learn the most fundamental concept in audio/video which sadly, most people don’t understand or get badly wrong. I can’t blame them as properly understanding them requires pretty deep understanding of how these content streams are captured, encoded and transmitted. This is useful knowledge to have as you would want to know how many songs or movies you can store on your 16 Gigabyte tablet or 4 Terabyte video server. Turns out that with some simple math and basic concepts we can learn everything we need here.

Let’s start at the top. My assumption in this article is that most of you know that there are eight bits in one byte. Okay, if you didn’t, don’t feel bad, as even the people who are supposed to know such as magazine writers and technical people in the field often confuse bits and bytes. As you can imagine, with almost an order of magnitude difference between these two terms, it is super important to get them right.

To avoid the above confusion I rarely abbreviate “b” for bit and “B” for byte, as is often done, and instead will spell them out as bits and Bytes. Since we usually deal with larger numbers of these, the prefixes Kilo, Mega, and Giga are added to represent thousands, millions, and billions, respectively.

Since these are computer concepts, the units here are not decimal but rather, binary. This means that “kilo” is actually 1024, not 1000. Throughout this article I will be using the familiar decimal versions of these numbers. The world won’t come to an end if we are off by 2.4 percent.

Let’s dive right in and talk about audio. To record audio on CDs, we have to convert the analog audio to digital. The “sampling rate” is 44.1 KHz, which means that we take a digitize the analog value 44,100 times per second (frequency is measured in Hz, which means one “cycle” per second—in this case, one sample per second). At the risk of sounding pedantic, CD is stereo which is comprised two independent channels. Each audio sample in turn has 16 bits of resolution, or two bytes.

If we multiply 44.1 KHz (the sampling rate) by two (number of channels), and then by 16 (bits of resolution), we get the data rate of CD in bits per second. The result is 1,411 Kbits/sec (“kbps”) or roughly 1.4 Mbit/sec (Mbps). Converting this to bytes, we get about 176 KBytes/sec.

When a CD drive is used for data there are other overheads leading the industry to use 150 Kbytes/sec as the reference baseline speed for a CD drive. This has become a marketing spec for optical media where the speed of the drive is specified by a number followed by “X” with X representing this 150 Kbytes/sec. So if you see a “20X” drive, it means that it can read at 20 * 150, or 3 MBytes/sec. I said this was a marketing spec because in reality the drive speed is variable depending on the track being read so average speed across the entire media (e.g. when you rip a CD) is lower than this number.

So far we are in the traditional A/V realm. Let’s jump into the new era. Here, we are talking about things like “128 kbps MP3.” This audio stream is exactly what it says it is. That is, the music file is represented as 128 Kbit/sec, compressed in MP3 format. This compares to our original CD source at 1.4 Mbit/sec.

To figure out how much the file is compressed, we simply divide the MP3’s data rate (128 Kbits/sec or 0.128 Mbits/s) by the CD’s data rate (1.4 Mbps) and arrive at 0.09. In other words, the MP3 file represents only 9 percent of the data of the original source. Perhaps more interesting is the inverse ratio, which tells us that a whopping 91 percent of the original information has been thrown out and you are hearing what is left! While as audiophiles we want our music to be of higher fidelity than 128 kbps MP3, it is remarkable how much quality is preserved relative to so little bits contained in that file.

Keep in mind that sampling rate and bit rates of compressed files are two entirely different things. A 128 Kbits/sec MP3 has the same sampling rate as the uncompressed music. Same is true of 256 Kbit/sec MP3 and 384 Kbits/sec. They are compressed versions of the same 44.1 KHz audio stream. So don’t make the common mistake of talking about the bit rate of the file as sampling rate.

Back to our original uncompressed CD, if we multiply its 176 Kbytes/sec data rate by 3,600 (seconds in one hour), we get the total space consumed for one hour of music, which is 630 MBytes (rounded to 650 MBytes to include overhead).

Now let’s apply the same math to the MP3. The 0.128 Mbit/sec must be divided by eight to convert it to bytes and then multiplied by 3,600 to get the same capacity requirement. This adds up to 57.6 MBytes/hour, showing the remarkable saving in storage capacity when using “lossy” audio compression. This is the reason that solid-state “flash”-based music players can hold so much compressed music. For a typical three-minute song, a 128 kbps MP3 would take up 2.88 MBytes of space. So if you have a 4-Gigabyte flash memory player, it can hold 4,000 / 2.88 or 1,388 songs. The same player would only hold 125 songs in the original uncompressed format that is on a CD.

Let’s put this in the context of audio for DVD and Blu-ray Disc. Here, we are using “5.1” channels of content to create a surround experience. The notiation means we have five full frequency channels, and a sixth low bandwidth channel indicated by the “.1.” Note that we don’t really have 5.1 channels as a mathematical figure because the low-frequency channel does not equate10 percent of the full bandwidth channel. But for the sake of simplifying our life, let’s pretend that it does use 10 percent as many bits to do its job and use the number 5.1 just like we used 2.0 for stereo computations.

For sampling rate of surround music, the standard in the industry is to deliver 48 KHz as opposed to 44.1 used in CDs. The sample resolution can be 16, 20, or 24 bits.

To arrive at the data rate of the uncompressed 5.1 source, and assuming 20-bit samples, we just need to multiply all of this together as we did with the CD: 5.1 (channels) * 48 (sampling rate) * 20 (bits of resolution) = 4.9 Mbits/sec. To figure it out for 16 and 24 bits, simply swap out the 20 for those numbers.

If we take the uncompressed data rate of 4.9 Mbit/sec, divide it by 8 to get bytes/sec, and then multiply by 3,600, we get a capacity requirement of 2.2 GBytes/hour. A two-hour movie would then need 4.4 Gigabytes just for the audio, or nearly half the capacity of a standard DVD!

Movies on DVD therefor are compressed using Dolby® Digital (AC-3) compression at typical data rate of 448 kbps. So let’s compute the compression ratio as we did with MP3. We simply repeat the same math by dividing 0.448 (data rate of Dolby Digital) by 4.9 (data rate of the uncompressed audio) and get 0.9. So as in case of MP3, quite a bit—91 percent—is thrown out in the process of compressing the multichannel audio. DTS® Digital Surround™, in contrast, at 1.5 Mbit/sec, would represent 30 percent of the original, or a very mild 3:1 compression ratio (although “half-rate” DTS at 750 Kbit/sec is also applying a reasonable amount of compression at 6:1).

Let’s put things in perspective in a different way. If we divide 448 Kbit/sec by 5.1 channels, we get 88 Kbits/sec allocated to each channel on the average. If we had a stereo track at the same rate, it would be at 2 (channels) * 88 (data rate), or 176 Kbits/sec. So, this Dolby Digital encoding has a 50 percent higher data rate than the 128 Kbits/sec MP3 example from before. While the compression techniques are different between the two formats, one can still see that the 448 kbps Dolby Digital is able to “breathe more,” as far as data rate is concerned, compared to 128 Kbps MP3. So in theory, this encoding is more transparent to the source.

When it comes to Blu-ray disc, we have a third option: lossless encoding. This is a process by which the audio data rate is reduced but the full fidelity is still maintained. Think of it as compressing your files on your computer and how you can get them back intact after decompression. Dolby TrueHD and DTS-HD™ Master Audio are both lossless surround audio formats supported optionally in Blu-ray Disc.

The price we pay here is that lossless compression is far less efficiency than lossy techniques like MP3. Typical compression ratios are about 2:1 for music, reaching up to 3:1 for multichannel movie sound. The efficiency becomes higher with more channels and (non-intuitively) lower at higher bit resolutions (e.g. 24 bits compared to 16 bits). Using a rough figure of a 2.5:1, we save a whopping 2.6 GBytes of space from our two-hour movie.

Again, the rest is here: Digital Audio and Video formats on CD, Blu-ray and Internet.

amirm · Sep 30, 2011

Good grief. Someone managed to get to the end without a headache after seeing all those numbers

.

A quiz: why does the efficiency of surround lossless compression goes down as you increase bit depth?

DonH50 · Nov 22, 2011

I know, I know!

Nice article. Will the world come to an end if we are off by 2.5%?

amirm · Nov 22, 2011

Thanks Don. But you are worse than me with teasing people

.

Answering the question, the lower order bits in audio samples tend to consist more of noise than real data. Since noise is random, it doesn't compress well (or at all). So if you go from 16 to 24 bits, your compression efficiency drops.

Search

Search

Learn how Audio and Video are encoded on CDs, Blu-rays and Internet

amirm

Banned

amirm

Banned

DonH50

Member Sponsor & WBF Technical Expert

amirm

Banned

Similar threads