Conclusive "Proof" that higher resolution audio sounds different

Hi Amir, I guess I was addressing both you and John, sorry :)

To be honest I just think that too much is being made from the fact that you and others were able to identify differences between these and some of Ethan's files using Foobar.

Since then, there seems to be lots of talk about defining the protocols necessary for blind-testing validity, based on the notion that just because these files weren't differentiated pre-Foobar, using 'trained listeners', past ABX tests must have been invalid and the procedures flawed, and future ones will be too unless they use trained listeners under even more strictly controlled conditions than standard double blind ABX testing already demands.

I'm not a trained listener but I was able to differentiate some of Ethan's files using the Mac version of Foobar, ABX Tester. Many others on the Pink Fish forum also did so both with Ethan's files and the jangling keys, none are trained listeners and all did so without much effort, or prior training and none used very strict protocols.

Elsdude has intimated that the unavailability of current tech may explain the lack of past positive results of tests of these files and this may indicate that listening to very short clips as per Foobar may be useful. This warrants investigation, IMO. He and others have also intimated that there may be reasons why the jangling keys files sound different that aren't relative to bit-depth.

So, basically I'm suggesting that we might keep things in perspective. I see no reason to totally rewrite ABX protocols yet, or to cast aspersions over blind-testing past and present, based on the few data points the recent differentiation of these files provide.
 
In reply to your "Right now the amount of evidence we have is trivial, and it doesn't all go the same way." I replied "& why would you expect all results to go the same way"-

Classic example of demeaning someone's intelligence and knowledge by forcing words into their mouth that they never said. I never said that I expected all results to go the same way, but you said quite clearly that I did. Anybody with a basic knowledge of statistics knows that one of the purposes of statistics to provide effective tools to use when all of the results don't go the same way. The only logical conclusion that can be reached that you either didn't know that, or you wished to represent that I didn't.
 
amirm,

Since this type of thread (objective vs subjective) seems to dominate the "General Discussion" category way too often (and often lead to the "thread closed" outcome) and it's relevance to "listening / enjoying music" is at best obtuse if not totally irrelevant to most folks, would you and the other mods consider creating an independent section that focuses on this topic versus allowing the ad nauseum / circular discussions that so often occur in the GD section.

Suggestion. ABX and other "controlled" listening tests.

Tim can be the moderator. :)

Just a thought.
Not a bad idea :). Let me discuss it with the other admins and see if we should do it.
 
I am not sure if you are addressing this to me but just in case :), I have repeatedly asked Arny and others to present one test, just one test, which complies with best practices of the industry/research community. None has been presented.

A deceptive statement that conveniently leaves out the other conditions that were forcably demanded. Basically, if it was not published in a peer-reviewed journal, no go. An example of a published DBT article that is widely referenced that was not acceptable is this one: https://www.dropbox.com/s/28segllqrugpa7o/stereo review Amp_Sound.pdf?dl=0

...we have standards for such work called ITU BS1116. Yet he and others routinely reference tests that don't even come close to complying with these best practices.

As usual there is a lot of truth in the words that some people kinda sorta skip over saying. The first edition of BS 1116 was BS 1116-0 which was first published in 7/94. Here is the creation of an ex post facto law that being asserted saying that every test that did not comply with a standard that was made up to 20 years after the test was done was invalid.

Even Arny has a "top 10" list that is a subset of this.

Or a set that either encompasses it or overlaps it.


Do I throw out the results out of hand if they don't comply? No.

Actually yes. The means used to thrown them all out is a concealed analysis that might be based on a lot of questionable judgements:

But I read through them and if I find failing after failing, then I will not reference their data as being valid.

I've never seen the details of the analysis that led to the above judgement, but I'm guessing that is was based on ignorance and highly biased. That's partially because when the tests were done BS1116 was not even a twinkle in someone's eye! So meeting its terms was not a goal of the documentation that was done before BS1116 was published. Only people who were part of the tests (like me) know the things that were done that weren't documented.

I guarantee you that since Amir was still basing his conclusions on sighted evaluations in articles he was writing back in say 2008 or even later, he wasn't present at DBTs that were done in 1975, 1985, 1995, or even 2005.
 
I'm not a trained listener but I was able to differentiate some of Ethan's files using the Mac version of Foobar, ABX Tester. Many others on the Pink Fish forum also did so both with Ethan's files and the jangling keys, none are trained listeners and all did so without much effort, or prior training and none used very strict protocols.
Such reports have been unheard of until now. What changed? Knowing that others can get positive results. We had so biased people that no one really critically listened to these tests. This is the first step by the way in becoming a trained listener. By getting positive results your hearing acuity is now above many people than before you attempted to take the test.

These are the developments that I am saying are unique and new on forums. And for DIYs who are unaware of such factors and get people to run these tests cold. Pre-training with controls that are easier tests serves the same purpose. The tester knows that in more degraded situation he could tell the difference. So he tries hard and focuses on critical segments to hear them again. I bet if I had not passed these tests you and others would not either. By the same token, these past tests would have garnered negative outcomes because of the failings in this regard.

Arny talks about fraud. Trust me, there are orders of magnitude more ways to game the test so that it generates negative outcomes than positive.

Elsdude has intimated that the unavailability of current tech may explain the lack of past positive results of tests of these files and this may indicate that listening to very short clips as per Foobar may be useful. This warrants investigation, IMO. He and others have also intimated that there may be reasons why the jangling keys files sound different that aren't relative to bit-depth.

So, basically I'm suggesting that we might keep things in perspective. I see no reason to totally rewrite ABX protocols yet, or to cast aspersions over blind-testing past and present, based on the few data points the recent differentiation of these files provide.
ABX protocols don't need rewriting as ABX is just a method of testing. What needs to happen is people following best practices protocol for running blind tests in general properly and give them equal chance to produce positive and negative outcomes. As a person in the industry, this outcome was nothing out of ordinary. We routinely use specific segments, trained listeners, critical material, knowledge of the algorithms, group training, etc. It is within the confines of forums where none of these are understood as being essential for proper tests.

We have gone so far that people see the words "DBT ABX" and they are ready to mortgage their home based on their results :). Add to that "peer reviewed" and they are willing to bet their life on them. :D Terms like trained listeners, critical segments, pre-training, etc. have not been even part of the lexicon of these discussions. But now they are. So we have advanced forward.

And if we accept these advances then by definition we need to recognize their absence in prior tests as being very material. And moving forward ignoring them still, a sin :). So while I see your point, in this business we need to be hardcore to get reliable results. Cutting corner after corner just won't do. Having people run tests who lack proper experience is also a recipe for improper results.
 
A deceptive statement that conveniently leaves out the other conditions that were forcably demanded. Basically, if it was not published in a peer-reviewed journal, no go. An example of a published DBT article that is widely referenced that was not acceptable is this one: https://www.dropbox.com/s/28segllqrugpa7o/stereo review Amp_Sound.pdf?dl=0.
I have never asked for peer-review, nor will use that to disqualify any test. Are you putting that test forward for us to analyze as following best practices in these tests? If so I am happy to analyze it.
 
Max, I hope from ArnyK's response to you that you can now understand this as has been told to you over & over again - these test criteria are commonly known criteria & shared by anybody in this field who is serious about such testing. These are criteria as laid out in the various BS116 documents to which you have been directed numerous times.

ArnyK's response to you:
"JJ's list isn't just JJ's list, its JJ's statement of a set of goals that many of us hold in common. Here's my version of the same basic ideas, written in my own words, but I believe first posted on the web some years earlier (year 2000):

Ten (10) Requirements For Sensitive and Reliable Listening Tests"
 
As usual there is a lot of truth in the words that some people kinda sorta skip over saying. The first edition of BS 1116 was BS 1116-0 which was first published in 7/94. Here is the creation of an ex post facto law that being asserted saying that every test that did not comply with a standard that was made up to 20 years after the test was done was invalid.

I've never seen the details of the analysis that led to the above judgement, but I'm guessing that is was based on ignorance and highly biased. That's partially because when the tests were done BS1116 was not even a twinkle in someone's eye!
Really? Meyer and Moran test was performed in 2007 a whopping 13 years later than original version of BS1116. Yet Meyer and Moran ignored almost every bit of recommendations in that document.

When it comes to topic of high resolution audio, Meyer and Moran is the most referenced test. The paper has only four external references. None of them are ITU BS1116. It does reference this paper however:

[4] D. Blech and M. Yang, “DVD-Audio versus SACD:
Perceptual Discrimination of Digital Coding Formats,”
presented at the 116th Convention of the Audio Engineering
Society, Berlin, Germany, 2004 May 8–11, convention
paper 6086.


If you read that paper, you see this reference:

[3] ITU Radiocommunication Assembly. 1997.
Methods for the subjective assessment of small
impairments in audio systems including
Multichannel Sound Systems. Recommendation

ITU-R BS. 1116-1: 1994-1997.

The researchers from University of Music Detmold knew the ITU recommendations. Yet the hobbyists who say they have read and referenced the paper didn't. One wonders if they actually had read the paper or were going by its reader's digest posted in online forums.

Even if BS1116 did not exist, anyone who had done these tests professional or in research would have learned the value of simple things like measuring the spectrum of music instead of assuming it was high resolution. I mean mistakes don't become more basic than this.
 
Well no need throw the baby out with the bathwater. You could still use the ABX software as long as you make sure it has time aligned samples to start with. A bother, but not that big of a bother.

While time alignment is necessary, it is not sufficient. This can be seen by looking at a Fourier expansion of a square wave and filtering off all of the harmonics, leaving a sine wave. This will increase the peak amplitude by a factor of 4/pi, about 0.6 dB. but there will be no change of time alignment between the two waveforms. If the "start" and "end" markers are set to allow playback of a single sample then the resulting "click" will differ in amplitude by an audible amount, even if both waveforms were audibly indistinguishable (e.g. a 9 kHz square wave will have its first harmonic at 27 kHz). This problem can be effectively eliminated by adding a slow fade-in and fade-out to the playback interval, e.g. exponential fade with time constant of 0.1 second.

Existing sample rate converters, including the best ones, have sub-sample delays. If we want to evaluate the sonic potential of various PCM formats we will have to avoid adding unnecessary requirements that rule out some of the most highly regarded sample rate converters, such as the iZotope 64 bit SRC. This SRC has subsample delays even when using linear phase filters (private communication with the author of the software). If minimum phase filters are used, as some suggest produce better sound, then delays will be frequency dependent.
 
I have never asked for peer-review, nor will use that to disqualify any test. Are you putting that test forward for us to analyze as following best practices in these tests? If so I am happy to analyze it.

There is no way that anybody who was not intimately familiar with the running of the test who could reasonably analyze its BS 1116 conformance from the existing documentation. As I have already explained once tonight and apparently had my explanation completely ignored, the write-up was not written to satisfy BS 1116 because BS 1116 did not exist for about another decade.
 
ABX protocols don't need rewriting as ABX is just a method of testing. What needs to happen is people following best practices protocol for running blind tests in general properly and give them equal chance to produce positive and negative outcomes. As a person in the industry, this outcome was nothing out of ordinary. We routinely use specific segments, trained listeners, critical material, knowledge of the algorithms, group training, etc. It is within the confines of forums where none of these are understood as being essential for proper tests.

We have gone so far that people see the words "DBT ABX" and they are ready to mortgage their home based on their results :). Add to that "peer reviewed" and they are willing to bet their life on them. :D Terms like trained listeners, critical segments, pre-training, etc. have not been even part of the lexicon of these discussions. But now they are. So we have advanced forward.

And if we accept these advances then by definition we need to recognize their absence in prior tests as being very material. And moving forward ignoring them still, a sin :). So while I see your point, in this business we need to be hardcore to get reliable results. Cutting corner after corner just won't do. Having people run tests who lack proper experience is also a recipe for improper results.

The above statements seem to make the same mistake that I keep pointing out again and again. They demonize DBTs by holding them to a far higher standard than any other kind of listening test. In fact the requirement for listener selection and training etc. apply to every kind of listening test. The main reason that these same requirements aren't being recomended for sighted evaluations is that sighted evaluations are so tremendously susceptible to false positives.

While I've claimed the invention of audio flavor of the ABX test as being something new and innovative at the time, the far more impressive innovation of mine at the time was the inventioin of listening test with a negative outcome. ;-)
 
Really? Meyer and Moran test was performed in 2007 a whopping 13 years later than original version of BS1116. Yet Meyer and Moran ignored almost every bit of recommendations in that document.

Nice deflection by means of unilaterally changing the topic. Fact is that I had repeatedly referenced you to the Stereo Review amplifier tests when you wrote:

"I have repeatedly asked Arny and others to present one test, just one test, which complies with best practices of the industry/research community. None has been presented."

Are you going to be accountable for what you just wrote or are you going to unilaterally change the subject every time I mention this?
 
While time alignment is necessary, it is not sufficient. This can be seen by looking at a Fourier expansion of a square wave and filtering off all of the harmonics, leaving a sine wave. This will increase the peak amplitude by a factor of 4/pi, about 0.6 dB. but there will be no change of time alignment between the two waveforms. If the "start" and "end" markers are set to allow playback of a single sample then the resulting "click" will differ in amplitude by an audible amount, even if both waveforms were audibly indistinguishable (e.g. a 9 kHz square wave will have its first harmonic at 27 kHz). This problem can be effectively eliminated by adding a slow fade-in and fade-out to the playback interval, e.g. exponential fade with time constant of 0.1 second.

Existing sample rate converters, including the best ones, have sub-sample delays. If we want to evaluate the sonic potential of various PCM formats we will have to avoid adding unnecessary requirements that rule out some of the most highly regarded sample rate converters, such as the iZotope 64 bit SRC. This SRC has subsample delays even when using linear phase filters (private communication with the author of the software). If minimum phase filters are used, as some suggest produce better sound, then delays will be frequency dependent.

Sox corrects for those timing shifts according to comments I have seen. The results certainly look like they do. I have read hearsay that iZotope is considering including that in their software just for better nulling in procedures like this.

Here is a bounce from 96/24 to 44/24 and back of Arny's jangling keys. The residuals are down around -175 db. Considering the bin size of the FFT the level for the 20-20khz band would be around -135 to 140 db.

Also shown is a 9 khz squarewave bounce from 96 to 48 and back to 96. Nulled against the original. Residuals are again over the 20-20 khz range around that -140db level. The bin size is different just to get the ultrasonic harmonics to show up on screen. Not shown is the level of the 9khz portion in the FFT. The original file was a -6 db square wave which had a 9khz level in the FFT of -8.3 db. The bounce to 48 and back to 96 also had a 9 khz component at -8.3 db. Bounce to 44 and back to 96 had some other artifacts, but the 9 khz component was down only .1 db in the resampled version.

No resampler is completely perfect or there would be zero residual artifacts at any level. But Sox and some like it seem to come close enough for our purposes. If we still need exponential fade in and out, again a bother, but not beyond preconditioning test signals.

Arny 24 bit null.jpg9khz square null.jpg
 
There is no way that anybody who was not intimately familiar with the running of the test who could reasonably analyze its BS 1116 conformance from the existing documentation. As I have already explained once tonight and apparently had my explanation completely ignored, the write-up was not written to satisfy BS 1116 because BS 1116 did not exist for about another decade.
So you have no test you can put forward which complies with best practices of the industry/research community. Right?
 
The above statements seem to make the same mistake that I keep pointing out again and again. They demonize DBTs by holding them to a far higher standard than any other kind of listening test.
That "demonizes DBTs?" We should allow any and all corners to be cut lest we are called on demonizing DBTs? No Arny, it doesn't work that way. We have to at all times strive for the best, whether it is in developing technology or improving the reliability of our listening tests. Talking about excellence there is just that: excellence. Sure, they will be people who strive for "good enough" or some other lower standard but please don't ask me to go there.

BTW, this is what you said on your now defunct web site back in year 2000:

This web site encourages conformance with applicable international technical standards organizations publications including ITU Recommendation BS 1116-1. You can find out more about this document at http://www.itu.int/itudoc/itu-r/rec/bs/1116-1.html . The approximate cost of this 26 page document in MS Word or PDF format is $12.00 US as of 10/1/2000.

You were encouraging the same best practices I am talking about then, but now it is called demonization?

In fact the requirement for listener selection and training etc. apply to every kind of listening test. The main reason that these same requirements aren't being recomended for sighted evaluations is that sighted evaluations are so tremendously susceptible to false positives.
Arny, once more, we are not talking about sighted tests. The discussion is 100% about double blind test as created by you, and your sanctioning of other double blind tests that are flawed in many ways. It is those flaws which we are discussing.

While I've claimed the invention of audio flavor of the ABX test as being something new and innovative at the time, the far more impressive innovation of mine at the time was the inventioin of listening test with a negative outcome. ;-)
You have indeed claimed that. As you know though, there is no historical record of you being the sole inventor of ABX. The paper by Clark on ABX in JAES says nothing about you being the inventor at all. And the only other record says you were a co-inventor of the comparator: http://home.provide.net/~djcarlst/abx.htm

"Thus we credit Arny Krueger and his opponent in the argument, Bern Muller, as the inventors of the ABX Comparator. "


That aside, the Arny of year 1984 was proud to prove amplifiers can sound different. The Arny we have now can't stand it even if his own test and content shows any positive outcome and is doing his best to discredit it.
 
snippage..................

Also shown is a 9 khz squarewave bounce from 96 to 48 and back to 96. Nulled against the original. Residuals are again over the 20-20 khz range around that -140db level. The bin size is different just to get the ultrasonic harmonics to show up on screen. Not shown is the level of the 9khz portion in the FFT. The original file was a -6 db square wave which had a 9khz level in the FFT of -8.3 db. The bounce to 48 and back to 96 also had a 9 khz component at -8.3 db. Bounce to 44 and back to 96 had some other artifacts, but the 9 khz component was down only .1 db in the resampled version.

No resampler is completely perfect or there would be zero residual artifacts at any level. But Sox and some like it seem to come close enough for our purposes. If we still need exponential fade in and out, again a bother, but not beyond preconditioning test signals.

View attachment 16986


My apologies, I made a slight mistake in my previous FFT about the 9 khz squarewave. I had actually made a sawtooth wave. The additional harmonics should have been the tip off something was wrong. Repeating with a squarewave in fact this time I got the following. The -6 db square wave had a .9 db 9khz component. Once bounced to 48/24 and back to 96 it still was a .9 db component. Residuals were still low in the same range roughly. Even at 9 khz the signal nulled to the -150 db range.
9khz square null corrected.jpg
 
Regarding BS1116 -- A) it's a recommendation (for subjective evaluation, by the way), not a standard. B) has anyone ever taken this recommendation? Ever? Who? In testing what? It's a nice, pretty comprehensive set of guidelines, but you'd never get audiophiles to agree on the basics -- appropriate source material and playback system.

It is being held up here as a standard, without which any result can be dismissed. Where has it been used?

Tim
 
This thread is, unfortunately, going nowhere fast. Which is too bad, because the initial discussion was really good.
 
Regarding BS1116 -- A) it's a recommendation (for subjective evaluation, by the way), not a standard.

Ah, nothing like a voice of reason! Yup, BS1116 is a recommendation not a standard, no matter how some people want to use like it was a standard by means all non-complying tests can be discarded.

Furthermore, if BS1116 were applied to all of the ABX tests being discussed in this thread as a standard, they would all have to be dismissed because BS1116 specifies ABC/hr as the DBT testing methodology, not ABX.
 
That "demonizes DBTs?" We should allow any and all corners to be cut lest we are called on demonizing DBTs?

Excluded middle argument. What I'm saying is that it would be cool if the self-appointed DBT auditors in this thread would devote themselves with equal energy to their own and others sighted evaluations.

No Arny, it doesn't work that way. We have to at all times strive for the best, whether it is in developing technology or improving the reliability of our listening tests.

Where was that thinking when you were pushing the results of your sighted evaluations, Amir?

Talking about excellence there is just that: excellence. Sure, they will be people who strive for "good enough" or some other lower standard but please don't ask me to go there.

Where was that thinking when you were pushing the results of your sighted evaluations, Amir?

BTW, this is what you said on your now defunct web site back in year 2000:

This web site encourages conformance with applicable international technical standards organizations publications including ITU Recommendation BS 1116-1. You can find out more about this document at http://www.itu.int/itudoc/itu-r/rec/bs/1116-1.html . The approximate cost of this 26 page document in MS Word or PDF format is $12.00 US as of 10/1/2000.

It would appear that I have a 40+ year history of doing one of the most excellent jobs of doing that as all but a few other living humans, given how many who would love to criticise my work were basing everything they did on sighted evaluations just a few years ago.

Arny, once more, we are not talking about sighted tests.

Interesting how so many want to sweep their own personal recent history under the table.
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu

Steve Williams
Site Founder | Site Owner | Administrator
Ron Resnick
Site Co-Owner | Administrator
Julian (The Fixer)
Website Build | Marketing Managersing