Forum members have probably heard the term "jitter" mentioned in the context of digital audio reproduction many times. Googling the terms "audio jitter" generates 780,000 hits! Yet, I doubt that some of the salient points of jitter is mentioned in the midst of heated discussions that occur around it.
This article is not designed to be complete coverage of the topic but rather, answer a question posed in another thread when I mentioned what the minimum spec must be met for a digital system to fully reproduce CD frequency response and resolution. The number is a surprising, 0.25 billionth of a second. This article is meant as an explanation of why, when we only have 44,100 samples, we need such mind numbing accuracy in timing to reproduce it.
First, an ultra quick introduction to jitter: jitter is one or more causes of variation in timing of a signal. What is a timing signal? Digital systems always operate according to a "clock." A clock is a sequence pulses which tell the system when to do its next thing. In audio, every clock tick tells the system to input or output another audio sample. Theory of digital audio reproduction assumes that the clock is infinitely precise. Unfortunately, this is never true in real life. In computers, the “Ghz” number you hear is a similar concept, determining how fast the system executes instructions.
Clock sources like any other electronic circuit or your watch, have their limit in how precise they are. Variations can occur due to many, many different causes which I won’t go through right now. One way or the other though, these variations cause the clock pulses to dance back and forth instead of precisely happening at the moment they are supposed to trigger.
You might be asking, so what that the timing changes a bit here and there. How can that change the sound? The answer lies in transforming this analogy “from time to domain” into “frequency domain.” Once there, we can then see if the spectrum of the signal changes. Here, the picture clears quite a bit. If you take a signal and move it back and forth in time using another signal (jitter), what you get in the output is your original signal, plus “sideband” shadows of it proportional to the difference between the two signals.
For example, if you try to playback a 5Khz signal, and you have a timing variation which has a frequency of 1Khz, what the system will reproduce is the 5 KHz signal, plus shadows of it at 4 and 6 KHz. These spurious frequencies are clearly distortion products as they did not exist in the original signal. And as such, can be used to compute the distortion of the system.
Question then becomes, how much jitter is audible? This can be a long, never ending debate . What I like to drill down into here is the answer to the original question: how low does the jitter need to be for us to be able to reproduce the full resolution of audio signal. Whether you can hear distortion if your CD system is reproducing 15 or 14 bits instead of 16 due to jitter, is a different topic.
There is a great paper that answers the question hand which was published years agao in Audio Engineering Society (AES) conference proceedings: http://www.nanophon.com/audio/jitter92.pdf. Alas, it is written for industry experts so it might be hard to digest it. So here is a simplification of it.
The paper computes jitter contributions to distortion by making the simplifying assumption that it could be sinusoidal in nature. The 60 Hz hum from the power supply in your audio equipment is an example of such source. It serves to change the clock time by the frequency of power coming into your house which if you seen on a scope, looks like a sine wave. Unfortunately, many causes of jitter are of other nature and audibly, can be more disruptive. For the purposes of this article though, we won’t have to drill there.
If we glance at the formula for computing jitter using sinusoidal spectrum, we see this conclusion:
"For sinusoidal jitter of amplitude J=500ps, a 20 KHz maximum level tone will produce sidebands at -96.1 dB relative to the input tone."
Let’s put that in English. The formula is telling us that if we want to reproduce a frequency response which extends to 20 Khz (typical end of human hearing spectrum and within the capability of CD audio), then the jitter value of 500 picoseconds, or 250 picoseconds above and below zero, creates a distortion product which is weaker than the signal by 96db.
OK, that is still Greek to many . Let’s apply one more level of translation. In digital processing, every bit of resolution is roughly equiv. to 6db of dynamic range. So a 16 bit CD audio signal has a 96 db (16 * 6) range between the quietest and loudest portions. Put another way, if you feed the system a 0 db reference signal, 16-bit signal is able to reproduce detail that is 96 db below that. Anything lower would be noise/distortion generated by the 16-bit system.
With me so far? Now let’s combine the two translations and realize that the -96db number in the paper was not picked at random, but was the lowest noise/distortion floor of 16-bit audio. Therefore, if we want to reproduce all that 16-bit audio can give us, up to the 20 KHz desirable response, then the clock timing cannot vary by more than 250 picoseconds. And therefore the minimum jitter spec must be below 250 picoseconds.
What is a picoseconds? It is an awfully small number. If you have forgotten the definition, here it is: http://en.wikipedia.org/wiki/Picosecond:
“A picosecond is 10?12 of a second. That is one trillionth, or one millionth of one millionth of a second, or 0.000 000 000 001 seconds. A picosecond is to one second as one second is to 31,700 years.”
At this point of the article, people usually ask if the above is audible. Once more, this article is not about what is audible but rather, what the system specification needs to be to reproduce what it is advertised to do. CD audio advertises 16 bits of sample resolution and 22 Khz frequency response. Therefore, its jitter spec must be below 0.25 billionth of a second. Period. The math dictates this so there is no debating it. So the precision required here is incredibly high – orders of magnitude more than anyone could imagine, not being skilled in the science.
This article is not designed to be complete coverage of the topic but rather, answer a question posed in another thread when I mentioned what the minimum spec must be met for a digital system to fully reproduce CD frequency response and resolution. The number is a surprising, 0.25 billionth of a second. This article is meant as an explanation of why, when we only have 44,100 samples, we need such mind numbing accuracy in timing to reproduce it.
First, an ultra quick introduction to jitter: jitter is one or more causes of variation in timing of a signal. What is a timing signal? Digital systems always operate according to a "clock." A clock is a sequence pulses which tell the system when to do its next thing. In audio, every clock tick tells the system to input or output another audio sample. Theory of digital audio reproduction assumes that the clock is infinitely precise. Unfortunately, this is never true in real life. In computers, the “Ghz” number you hear is a similar concept, determining how fast the system executes instructions.
Clock sources like any other electronic circuit or your watch, have their limit in how precise they are. Variations can occur due to many, many different causes which I won’t go through right now. One way or the other though, these variations cause the clock pulses to dance back and forth instead of precisely happening at the moment they are supposed to trigger.
You might be asking, so what that the timing changes a bit here and there. How can that change the sound? The answer lies in transforming this analogy “from time to domain” into “frequency domain.” Once there, we can then see if the spectrum of the signal changes. Here, the picture clears quite a bit. If you take a signal and move it back and forth in time using another signal (jitter), what you get in the output is your original signal, plus “sideband” shadows of it proportional to the difference between the two signals.
For example, if you try to playback a 5Khz signal, and you have a timing variation which has a frequency of 1Khz, what the system will reproduce is the 5 KHz signal, plus shadows of it at 4 and 6 KHz. These spurious frequencies are clearly distortion products as they did not exist in the original signal. And as such, can be used to compute the distortion of the system.
Question then becomes, how much jitter is audible? This can be a long, never ending debate . What I like to drill down into here is the answer to the original question: how low does the jitter need to be for us to be able to reproduce the full resolution of audio signal. Whether you can hear distortion if your CD system is reproducing 15 or 14 bits instead of 16 due to jitter, is a different topic.
There is a great paper that answers the question hand which was published years agao in Audio Engineering Society (AES) conference proceedings: http://www.nanophon.com/audio/jitter92.pdf. Alas, it is written for industry experts so it might be hard to digest it. So here is a simplification of it.
The paper computes jitter contributions to distortion by making the simplifying assumption that it could be sinusoidal in nature. The 60 Hz hum from the power supply in your audio equipment is an example of such source. It serves to change the clock time by the frequency of power coming into your house which if you seen on a scope, looks like a sine wave. Unfortunately, many causes of jitter are of other nature and audibly, can be more disruptive. For the purposes of this article though, we won’t have to drill there.
If we glance at the formula for computing jitter using sinusoidal spectrum, we see this conclusion:
"For sinusoidal jitter of amplitude J=500ps, a 20 KHz maximum level tone will produce sidebands at -96.1 dB relative to the input tone."
Let’s put that in English. The formula is telling us that if we want to reproduce a frequency response which extends to 20 Khz (typical end of human hearing spectrum and within the capability of CD audio), then the jitter value of 500 picoseconds, or 250 picoseconds above and below zero, creates a distortion product which is weaker than the signal by 96db.
OK, that is still Greek to many . Let’s apply one more level of translation. In digital processing, every bit of resolution is roughly equiv. to 6db of dynamic range. So a 16 bit CD audio signal has a 96 db (16 * 6) range between the quietest and loudest portions. Put another way, if you feed the system a 0 db reference signal, 16-bit signal is able to reproduce detail that is 96 db below that. Anything lower would be noise/distortion generated by the 16-bit system.
With me so far? Now let’s combine the two translations and realize that the -96db number in the paper was not picked at random, but was the lowest noise/distortion floor of 16-bit audio. Therefore, if we want to reproduce all that 16-bit audio can give us, up to the 20 KHz desirable response, then the clock timing cannot vary by more than 250 picoseconds. And therefore the minimum jitter spec must be below 250 picoseconds.
What is a picoseconds? It is an awfully small number. If you have forgotten the definition, here it is: http://en.wikipedia.org/wiki/Picosecond:
“A picosecond is 10?12 of a second. That is one trillionth, or one millionth of one millionth of a second, or 0.000 000 000 001 seconds. A picosecond is to one second as one second is to 31,700 years.”
At this point of the article, people usually ask if the above is audible. Once more, this article is not about what is audible but rather, what the system specification needs to be to reproduce what it is advertised to do. CD audio advertises 16 bits of sample resolution and 22 Khz frequency response. Therefore, its jitter spec must be below 0.25 billionth of a second. Period. The math dictates this so there is no debating it. So the precision required here is incredibly high – orders of magnitude more than anyone could imagine, not being skilled in the science.