Why 24/192 is a bad idea?

amirm · Apr 28, 2012

opus111 said:
Dan Lavry has a fair bit to say on this - distilled its really that there's always a trade-off between accurate conversion and fast conversion. So all things being equal, there does need to be a sound engineering reason for going beyond 96k. I can't see one myself, despite having trawled around for several years.

Indeed. If one looks at the distortion specs for DACs, often the higher sampling rates have higher distortion. And if one believes that all of that bandwidth is needed, then the impact of jitter goes way up too. The accuracy needed at 192 Khz, 24-bit is just crazy high! By my quick math, you need better than one picoseconds to achieve 24-bit accuracy! Even for 20 bits you need better than 10 picoseconds.

Great first post by the way.

opus111 · Apr 29, 2012

Thanks

What I find is interesting s the relative ignorance (myself included, until very recently) about the dynamic characterizatons of DACs (the chips, not the hifi boxes). Historically they were talked about (when I was an EE major, in the days before CD) from the industrial process side of things which means measurements like INL (integral non-linearity), DNL (differential non-linearity) and monotonicity. But those industrial guys do recognize dynamic aspects like glitch energy. When we look at an audio DAC's datasheet we seem to get the worst of both worlds - nothing about glitch, nothing about INL and DNL, just a few single tone THD figures (FFT plots if we're very lucky).

I only discovered the whole new world of DAC specs when I started looking at DACs designed for communications purposes (GSM base-stations and the like) where dynamic performance is the most crucial metric. An eye-opener to say the least!

Lee · Apr 30, 2012

amirm said:
Indeed. If one looks at the distortion specs for DACs, often the higher sampling rates have higher distortion.

Quite true for most hardware as well. That is why Benchmark up/downsamples to 110khz per the Elias Gwinn paper because they found that 110khz is optimal for the kind of chips they use.

DonH50 · May 1, 2012

opus111 said:
Thanks

What I find is interesting s the relative ignorance (myself included, until very recently) about the dynamic characterizatons of DACs (the chips, not the hifi boxes). Historically they were talked about (when I was an EE major, in the days before CD) from the industrial process side of things which means measurements like INL (integral non-linearity), DNL (differential non-linearity) and monotonicity. But those industrial guys do recognize dynamic aspects like glitch energy. When we look at an audio DAC's datasheet we seem to get the worst of both worlds - nothing about glitch, nothing about INL and DNL, just a few single tone THD figures (FFT plots if we're very lucky).

I only discovered the whole new world of DAC specs when I started looking at DACs designed for communications purposes (GSM base-stations and the like) where dynamic performance is the most crucial metric. An eye-opener to say the least!

The IEEE has a Standard (1241) for characterizing ADCs that is well worth ahving if you are really interested in the specs and what they mean. DACs are similar, at least from the key specs point of view.

If you go to the manufacturer's DAC data sheets, the ones for the chips, not the audiophile boxes, you'll find many more specs. Glitch energy is more the domain of HF DACs, true; it is less an issue at audio.

bdiament · May 1, 2012

My 2 cents on this:

Hearing folks say 4x sample rates (i.e., 176.4k and 192k) are inferior or somehow not desirable feels a lot to me, like folks telling me there are no colors in a rainbow.

Two points (1 cent each? ;-}):

1. In my experience, a great many DACs that are spec'd for 4x rates actually sound *worse* at these rates than they do at 2x rates (i.e., 88.2k and 96k). This, I attribute to the significantly increased demands on clocking accuracy and on the need for analog stages that can perform (for real) at wide bandwidth.

2. When the 4x rates are done right (I'm a big fan of Metric Halo's ULN-8, which I use and the mic-preampless, LIO-8 version of the same design), something magical happens. A threshold is crossed - which is not crossed at 24/96 - and for the very first time in my experience, I'm getting back the sound of my mic feeds. I've never experienced this before with any analog machine or with any digital device in the past. To me, this is something to be shouted from the digital rooftops. I feel the jump from 24/96 to 24/192 is larger than the jump from 16/44 to 24/96, for the simple reason of that threshold. It no longer sounds like "great digital" (or "great analog"), it sounds like the output from the microphones!

Another thing I see way too much is folks thinking the benefits of 24-bit are the theoretical 144 dB of "dynamic range" (more accurately in this case, signal-to-noise ratio). They'll say it is impractical, unusable and unrealistic because other electronics don't have noise floors that low and 144 dB of "dynamics" will damage the listener. I say this completely misses the point (much less misses what I find to be immediate and obvious sonic benefits).

It isn't about "dynamic range" (particularly in an age when anything over 10 dB is deemed "dynamic" ;-** ). It is about *resolution*. Real music - even studio created "music" for that matter, provided it isn't dynamically eviscerated - isn't at max volume all the time. With real music and with recordings where dynamics have not been compromised, the average level will be 20, 30, sometimes 40 dB down from the maximum peaks. With a 16-bit system, a signal that is 20 dB down from max will be encoded using about 12 or 13 bits. The quieter parts of very dynamic music, say at -40, will be encoded using about 9 bits. What this lower resolution means is that instrumental harmonics lose complexity, get "bleached" and thinned. The space around the players becomes "darkened" and de-focused. Keep in mind that even when the music is at max level, instrumental harmonics and things like spatial cues are *much* lower in level. Is it any wonder then, that at 16-bits, these are the first things to disappear?

At 24-bits, that -20 signal will be encoded using 20 or 21 bits. That -40 signal will be encoded using about 17 bits - still better resolution than 16-bit on its best day, going downhill with the wind behind it. It isn't about dynamics or about signal-to-noise ratio; it is about resolution of detail, it is about capturing and being able to play back musical information.

Same with the higher sample rates and the wider bandwidth. It isn't about hearing the frequencies bats, dogs and dolphins hear; it is about getting the *audible* spectrum correct, particularly in the *time* domain. KEF in the UK did research decades ago showing that correct time response for frequency x requires a bandwidth of 5x.

Perhaps surprisingly (it was to me), one of the greatest benefits of 4x sample rates occurs at the *bottom* end. With well done 24/192, I'm hearing low end unlike anything I've ever heard via an audio system but very much like what bass sounds like in real life. It loses the undifferentiated quality it has on all too many recordings and it loses the somewhat mechanical quality it has on a lot of digital recordings. It consists of pitch and it pressurizes the air, just as it does in real life.

So folks can publish all the white papers they like. (Charmin too, is a white paper. ;-}) They can tell me there are no colors in the rainbow.
They can tell me I'm losing "accuracy", when I'm absolutely overjoyed to have, for the very first time, the unadulterated sound of my mic feed.
I don't consider that a loss of accuracy and deem such quite silly in view of the fact that it is in truth, a level of accuracy unlike anything I've heard in audio.
I won't waste time arguing with them. I'm too busy enjoying the marvels of Keith Johnson's work at 4x rates and making my own recordings at 24/192 - and very much enjoying being able to distribute them to listeners, who with the right setup, they will hear *exactly* what I hear in the studio.

Downside? Yes, I've found one. I can no longer blame my gear for any flaws in the recording. Of any that exist, I must take full ownership.

Best regards,
Barry
www.soundkeeperrecordings.com
www.barrydiamentaudio.com

Lee · May 1, 2012

Good information as always Barry.

microstrip · May 1, 2012

opus111 said:
(...) An interesting exercise I once did was compare the advances in multibit DAC chips over time - it was a side-by-side comparison between PCM63-K and PCM1704-K. What I found was that although the PCM1704's headline (read 0dBFS) figure was better, it actually got 4dB worse than the PCM63 where it really matters, which is down at -20dBFS. I wondered if one reason for this was the higher sample rate its measured at (768k vs 384k).

I have a suspicion (by no means yet supported by hard evidence) that one reason NOS DACs get such good subjective reviews is down to them working less hard and hence giving more accurate results in practice.

Very interesting. It is just anecdotal, but I once owned a Sony X7ESD using the PCM63-K that I modified myself (better ICs , capacitors and some supply components, nothing else). For a long time, after I let it go, I felt its replacements were just steps back. Yes, even assuming the DIY bias expectation it really sounded great.

But the main reason of my post is your statement "where it really matters, which is down at -20dBFS". Why do you feel that the most important zone of the scale is the -20dBFS?

Ethan Winer · May 1, 2012

I'm surprised nobody else corrected this yet:

bdiament said:
It isn't about "dynamic range" (particularly in an age when anything over 10 dB is deemed "dynamic" ;-** ). It is about *resolution*. Real music - even studio created "music" for that matter, provided it isn't dynamically eviscerated - isn't at max volume all the time. With real music and with recordings where dynamics have not been compromised, the average level will be 20, 30, sometimes 40 dB down from the maximum peaks. With a 16-bit system, a signal that is 20 dB down from max will be encoded using about 12 or 13 bits. The quieter parts of very dynamic music, say at -40, will be encoded using about 9 bits. What this lower resolution means is that instrumental harmonics lose complexity, get "bleached" and thinned. The space around the players becomes "darkened" and de-focused.

I don't know what "bleached" and "de-focused" mean, but this misses a key component of digital audio - the reconstruction filter. When digital audio is properly implemented - and it almost always is these days - the only benefit of using more bits is a lower background noise level. The vertical "step" resolution is the same, because the reconstruction filter removes the steps. This is explained in my new audio book, but you don't have to take my word for it. Ken Pholmann's Principles of Digital Audio explains it in even more depth.

--Ethan

bdiament · May 1, 2012

There is nothing in what you quoted that is not correct Ethan.

You have your perspectives, with which I'm familiar - and I have mine, based on my experiences.
Bit usage can be seen using tools like SpectraFoo, by anyone who cares to look.
Bleached harmonics and de-focused space can be heard by anyone who cares to listen.

Barry
www.soundkeeperrecordings.com
www.barrydiamentaudio.com

jkeny · May 1, 2012

Ethan, can you drop the "book" advertising in every post, it grates somewhat - you already have it in your sig!

amirm · May 1, 2012

Ethan Winer said:
When digital audio is properly implemented - and it almost always is these days ---Ethan

How do you know this Ethan? Few equipment manufacturers release any measurements of their DACs. Instead, they push the "codec" (DAC chip) specs ("32-bit, 192 Khz") as if that applies to their implementation. What is your definition of proper implementation?

DonH50 · May 1, 2012

Comment: Distortion typically rises with increasing sample rate even for similar signal frequencies because the sampling circuits have less time to acquire and settle. Noise typically rises due to the wider bandwidth required for higher sampling rates. Jitter requirements do not change with the clock, however; it only depends upon the signal frequency and converter (ADC or DAC) resolution.

p.s. The output filter does not eliminate the steps, especially for lower frequencies. It is there primarily to suppress sampling images, and to reduce wideband noise and HF glitches. If you think about it, a 20 kHz filter cutoff is several decades above a 20 Hz signal...

opus111 · May 1, 2012

microstrip said:
Very interesting. It is just anecdotal, but I once owned a Sony X7ESD using the PCM63-K that I modified myself (better ICs , capacitors and some supply components, nothing else). For a long time, after I let it go, I felt its replacements were just steps back. Yes, even assuming the DIY bias expectation it really sounded great.

Dan Ariely writes enticingly about this DIY expectation bias in one of his books - 'The Upside of Irrationality'. He calls it 'The IKEA effect'.

But the main reason of my post is your statement "where it really matters, which is down at -20dBFS". Why do you feel that the most important zone of the scale is the -20dBFS?

Glad you picked up on that. Its not that I think -20dBFS is the most important zone, rather that its way more important than the 'headline' figure for DACs (at 0dBFS). That's because music spends almost zero time at 0dBFS - I have recordings which when loaded up into Audacity I was surprised to see are visibly clipped. So my view is really that from -20dBFS and all the way down is important, what's above -20dBFS is considerably less important in THD terms due to music not normally containing pure tones. Remember the datasheets are characterizing performance based on sines and these have a smaller crest factor compared to music. Combine ten equal amplitude tones together and the maximum any one tone can reach won't be much over -20dBFS - it seems to me that multitone signals are more music-like and hence a much better way of characterizing a DAC designed to play music.

opus111 · May 1, 2012

Ethan Winer said:
When digital audio is properly implemented - and it almost always is these days

As a digital audio implementer myself, I'm curious to have examples from you of what you consider 'proper' implementations. Those to be ones where hi res internal pics are findable by Google. Then it might be instructive for those present for me to explain where so many implementations are 'improper'.

opus111 · May 1, 2012

bdiament said:
Bleached harmonics and de-focused space can be heard by anyone who cares to listen.

Yes, its kinda curious that so few actually do listen out for this. IME bleaching has a fairly straightforward solution - much better control of RF noise and avoiding RF noise (even seemingly low levels) getting into audio ICs. Deane Jensen I believe was the first published in this regard - his AES paper on 'Spectral Contamination' is worthy of seeking out.

xiphmont · May 2, 2012

personal biases

Sometimes pingbacks are interesting...

amirm said:
I see that he has embellished it a ton more, no doubt to plug the deficiencies in the first version.

Whenever I get mail from a reader that indictates he was confused by what I wrote, I try to improve the wording; the point is education after all. There's been no substantive change to the article since I first put it up, and I've updated the revision so that people know the article changed. I don't think anything about that is particularly untoward.

amirm said:
As to who he is, I know him exceptionally well, perhaps longer and earlier than most people

Um.... I don't think we've actually met...? You were effectively my counterpart/rival in the Windows Media group 'back in the day' and we exchanged a few testy emails.

amirm said:
dating back to shortly when ogg had come out (some 20 years ago).

Ogg Vorbis development began in October 1998. It saw first alpha in 2000, and I think I first sent you email in 2001. So... 11ish years.

amirm said:
I also know that he is a good salesman and doesn't mind pushing the line of what the reality is. When he first came out with Ogg, he used to claim it was a superior codec to others.

It was. Spanked WMA pretty hard in independent tests if I recall.

amirm said:
What he wouldn't say is that the default encoding mode was variable rate encoding, not fixed.

It was only a big bullet point feature, advertised as part of what made it superior, featured prominently in every interview I gave about it, the documentation, the manpage, and the runtime help.

amirm said:
So people would use it out of box that way

Yes, it worked properly without tinkering. So rare, it's unfair.

amirm said:
I went on the forum where he hung out and try as I might, I could not get him to accept this unfair set of comparisons.

Xiph.Org doesn't have any forums, we use mailing lists and IRC. I don't see any such complaints in my archives of either. But I'm probably just forgetting, got a URL... ?

amirm said:
He also had little understanding of patents -- at least at the time. He had this simplistic idea that if he had written something, and put it in open source, it could not be covered by anyone else's patents.

Oh, really? Now I really want to see a reference.

amirm said:
In this case, as I noted, he wrote what can at best be described as a much poorer and less authoritative version of Bob Stuart's AES paper as I explained in that thread. Unlike Bob's paper, his references are often forum threads, and anecdotal.

My audience was the semi-technical from unrelated fields. Busting out the 6.182 course notes wasn't going to convince anyone.

Bob is in my 'further reading' section.

amirm said:
Much of the knowledge that is needed to describe this field requires hardware and hands on design with things like DACs.

Mm, yes, quite a bit of hands-on hardware design experience I agree. Formal training doesn't hurt either.

amirm said:
BTW, Monty did not create FLAC. His organization adopted it after it was developed: http://en.wikipedia.org/wiki/FLAC.

This is correct. I wrote a piece of software called 'Squish' back in 1993. Josh Coalson later wrote a competing package named FLAC, and approached me about it (actually, I don't remember for sure who approached who... might have been brokered). His sourcebase was newer, better written, higher performance, so he agreed to join Xiph and we ditched Squish.

amirm said:
I used to live and breath the space he comes from. It used to be my full-time career to manage such including lots of hands on work.

It's still mine. Do you have any specific complaints about the article to offer, Amir, or just the ad hominem?

Monty
Xiph.Org

bdiament · May 2, 2012

Hi opus111,

opus111 said:
Yes, its kinda curious that so few actually do listen out for this. IME bleaching has a fairly straightforward solution - much better control of RF noise and avoiding RF noise (even seemingly low levels) getting into audio ICs. Deane Jensen I believe was the first published in this regard - his AES paper on 'Spectral Contamination' is worthy of seeking out.

While I would agree with the points you raise, I think these are only some of many factors involved. Even in the absence of RF, recording low level information with a 16-bit system is still going to represent that info with considerably less than 16-bit resolution. As you know, only the information in the top 6dB (more like 6.02) is going to use all 16-bits. Most of the music, as well as any harmonic and spatial information in the recording is going to be considerably lower in level and hence, encoded at a lower resolution.

This can be heard in many older CDs during fades (done at the time, to 16-bit sources). As quantizing noise and distortion increases at those lower levels, the coarsening of the sound can be heard easily. Or, one could make a 16-bit recording, keeping the peaks down around -20 -- or even more easily heard, at lower levels still -- then raising the level on playback to make this even more obvious.

Of course, one "solution" would be to ensure that all the music in a recording stayed within those top ~6 dB. Then, we'd be assured of full 16-bit resolution. But there would be no Life left in the record.
Oh wait! That's what many of the major labels are doing with the loudness wars, isn't it? ;-}

I remember the first time I heard a comparison of a 24-bit recording with the same recording done at 16-bits. The former had a beautiful cello sound, complete with its wide ranging and quite complex harmonics; the latter sounded, in comparison, a bit like a kazoo. ;-}

Best regards,
Barry
www.soundkeeperrecordings.com
www.barrydiamentaudio.com

opus111 · May 2, 2012

bdiament said:
While I would agree with the points you raise, I think these are only some of many factors involved. Even in the absence of RF, recording low level information with a 16-bit system is still going to represent that info with considerably less than 16-bit resolution.

Of course, even for a 24bit system, fewer than 24bits will actually be in use for most of the time. And nobody's built a true 24bit system so far, to my knowledge.

As you know, only the information in the top 6 bits (more like 6.02) is going to use all 16-bits.

I assume you meant in the top 6.02dB, yes.

Most of the music, as well as any harmonic and spatial information in the recording is going to be considerably lower in level and hence, encoded at a lower resolution.

Yes, without a doubt.

This can be heard in many older CDs during fades (done at the time, to 16-bit sources). As quantizing noise and distortion increases at those lower levels, the coarsening of the sound can be heard easily.

Were these older CDs correctly dithered do you know?

Of course, one "solution" would be to ensure that all the music in a recording stayed within those top ~6 dB. Then, we'd be assured of full 16-bit resolution. But there would be no Life left in the record.
Oh wait! That's what many of the major labels are doing with the loudness wars, isn't it? ;-}

I'm guessing you use the scare quotes because, like me you consider such a 'solution' to be worse than the problem its intended to solve.

I remember the first time I heard a comparison of a 24-bit recording with the same recording done at 16-bits. The former had a beautiful cello sound, complete with its wide ranging and quite complex harmonics; the latter sounded, in comparison, a bit like a kazoo. ;-}

The number of bits on the datasheet of the ADC is not a guarantee that those bits are real. In the case of '24 bit' converters for audio they're almost certainly not real engineering bits, rather marketing ones. Thus the difference between these two isn't going to be as great as might be first thought. Perhaps the 16 bit converter sucked?

bdiament · May 2, 2012

Hi opus111,

opus111 said:
...I assume you meant in the top 6.02dB, yes.

Yes, thank you. I've corrected my post.

opus111 said:
...Were these older CDs correctly dithered do you know?...

I'm not so sure the original 1610/1630 system even incorporated dither. It was a 16-bit system. There was no longer wordlength math, hence nothing to dither.

opus111 said:
...I'm guessing you use the scare quotes because, like me you consider such a 'solution' to be worse than the problem its intended to solve.

Yes, though it seems to be quite a popular approach nowadays. ;-[

opus111 said:
...The number of bits on the datasheet of the ADC is not a guarantee that those bits are real. In the case of '24 bit' converters for audio they're almost certainly not real engineering bits, rather marketing ones. Thus the difference between these two isn't going to be as great as might be first thought. Perhaps the 16 bit converter sucked?

Yes. This is true of software as well - there isn't that much that is truly "bit clean" in the lower order bits, in spite of the "24-bit" specification.
Doesn't say much about the so-called "32-bit" converters, does it? (Then again, perhaps it says a lot. ;-})

I consider myself fortunate to have in my room the most transparent converters in my experience, whether running at 16-bits or 24-bits.
While those old time converters were among the best in their day (there wasn't much competition), compared to today's best, they did indeed suck.
But even with today's best, a 16-bit recording will, to my ears, still suffer the slings and arrows of 16-bits. As we both noted, most of the time, the music isn't at the top volume level - and harmonic and spatial information is well down from this even when the peaks approach 0 dBFS (as loud as digital can get). This is why, in my view, the best sounding CDs and 16-bit files are created from longer wordlength sources. In my experience, the result, with good dither/noise shaping applied, will preserve information from the longer wordlength source that will simply not exist with a 16-bit source.
I am not aware of anyone still making original recordings at 16-bits and I believe this is why.

Best regards,
Barry
www.soundkeeperrecordings.com
www.barrydiamentaudio.com

opus111 · May 2, 2012

bdiament said:
I'm not so sure the original 1610/1630 system even incorporated dither. It was a 16-bit system. There was no longer wordlength math, hence nothing to dither.

Well according to digital sampling theory, +/- 1LSB TPF dither should be used to avoid noise modulation and quantization distortion, at the input to the quantizer (read ADC). So with no dither then I'd expect problems at lower level unless the ADC input stage has enough of its own noise to self-dither. This was indeed the case I understand with some anti-aliasing filter modules used in the olden days.

Yes. This is true of software as well - there isn't that much that is truly "bit clean" in the lower order bits, in spite of the "24-bit" specification.

That would be inexcusable - for the software of today not to provide sufficient word length. Even 1990s technology DSPs (Motorola 56k) had 24 bits word length.

Why 24/192 is a bad idea?

Banned

Banned

Well-Known Member

Member Sponsor & WBF Technical Expert

Member

Well-Known Member

VIP/Donor

Banned

Member

Industry Expert, Member Sponsor

Banned

Member Sponsor & WBF Technical Expert

Banned

Banned

Banned

New Member

Member

Banned

Member

Banned

Similar threads