Conclusive "Proof" that higher resolution audio sounds different

Orb · Aug 16, 2014

amirm said:
What they used is part of the workflow that professionals use to produce the music we get. Sox, iZotope, etc. are not commonly used by pros so it is not part of the music chain we receive.

What is fascinating is that after so many years, it is only now, through this testing, that we realize the files are not level matched, time sync'ed, etc. How come we assumed they were and challenged people to tell them apart?

Even with the flaws, hardly anyone has posted ABX tests of Scott/Mark files. THis shows that trained listeners do exist and can outperform other listeners. Therefore tests that did not use trained listeners are not reliable.

And interesting how over the years the vocal audio-abx hobbyists put forward no-one passed these type of tests with ABX, even when some of the tests would had some of these variables and a few other

Cheers
Orb

esldude · Aug 16, 2014

amirm said:
What they used is part of the workflow that professionals use to produce the music we get. Sox, iZotope, etc. are not commonly used by pros so it is not part of the music chain we receive.

What is fascinating is that after so many years, it is only now, through this testing, that we realize the files are not level matched, time sync'ed, etc. How come we assumed they were and challenged people to tell them apart?

Even with the flaws, hardly anyone has posted ABX tests of Scott/Mark files. THis shows that trained listeners do exist and can outperform other listeners. Therefore tests that did not use trained listeners are not reliable.

Would be interesting to see if you could ABX successfully a bounce to 44/24 and back. See if the dither is what you are perceiving.

Just looking at the infinite wave site with a few comparisons. Audition assuming the CC version hasn't regressed gives results very much like Audacity 2.0.3 on the tests at http://src.infinitewave.ca/

jkeny · Aug 17, 2014

amirm said:
.....

What is fascinating is that after so many years, it is only now, through this testing, that we realize the files are not level matched, time sync'ed, etc. How come we assumed they were and challenged people to tell them apart?

Even with the flaws, hardly anyone has posted ABX tests of Scott/Mark files. THis shows that trained listeners do exist and can outperform other listeners. Therefore tests that did not use trained listeners are not reliable.

Orb said:
And interesting how over the years the vocal audio-abx hobbyists put forward no-one passed these type of tests with ABX, even when some of the tests would had some of these variables and a few other

Cheers
Orb

Yes, totally agree - I have said the same.

Tony Lauck · Aug 19, 2014

Invalidity of Tests done with PC ABX due to lack of fade-in / fade-out

Any tests down with PC ABX/Foobar2000 are potentially invalid. The problem is that if the two files have a small sample shift or some other reason that causes there to be a different DC component at a sample point that differs due to a slight time shift then the two files may sound different with PC ABX. (Example: there will be a perceptible click on one file and a different sounding click on the other file.)

For example, I created two files at 96/24 consisting of 10 seconds of a =12 dBfs 1000 Hz tone, slowly fading in and out (0.1 second fade with SoundForge). Then I added 1/4 cycle silence to the beginning of one file and to the end of the other (24 samples). Playing the two files all the way through they sounded identical using PC ABX. However, if I selected an arbitrary interval in the middle of the file, so as to avoid the fade-in, then after two or three tries I found a setting where the same interval produced an obviously different starting click and so was able to ABX 100% reliably.

Using the same technique with the jangling key files, I found start points where it was trivial to ABX the two files, even though I had trouble hearing a difference when playing them all the way through.

As this is my first post, I will refrain from making any further comments that might be considered contentious or personal or which might violate the terms of service of this forum. (I was going to post this on the AVS forum, but I see that the thread was locked.)

Andre Marc · Aug 19, 2014

Tony Lauck said:
Any tests down with PC ABX/Foobar2000 are potentially invalid. The problem is that if the two files have a small sample shift or some other reason that causes there to be a different DC component at a sample point that differs due to a slight time shift then the two files may sound different with PC ABX. (Example: there will be a perceptible click on one file and a different sounding click on the other file.)

For example, I created two files at 96/24 consisting of 10 seconds of a =12 dBfs 1000 Hz tone, slowly fading in and out (0.1 second fade with SoundForge). Then I added 1/4 cycle silence to the beginning of one file and to the end of the other (24 samples). Playing the two files all the way through they sounded identical using PC ABX. However, if I selected an arbitrary interval in the middle of the file, so as to avoid the fade-in, then after two or three tries I found a setting where the same interval produced an obviously different starting click and so was able to ABX 100% reliably.

Using the same technique with the jangling key files, I found start points where it was trivial to ABX the two files, even though I had trouble hearing a difference when playing them all the way through.

As this is my first post, I will refrain from making any further comments that might be considered contentious or personal or which might violate the terms of service of this forum. (I was going to post this on the AVS forum, but I see that the thread was locked.)

Hello Tony, and welcome. I know you to be a prolific, knowledge, and informed poster on various forums.

Folks, we are lucky to have Tony.

treitz3 · Aug 19, 2014

Welcome to the forum, Mr. Tony Lauck. Nice introductory post!

Tom

Steve Williams · Aug 19, 2014

Indeed welcome Tony.

esldude · Aug 19, 2014

Tony Lauck said:
Any tests down with PC ABX/Foobar2000 are potentially invalid. The problem is that if the two files have a small sample shift or some other reason that causes there to be a different DC component at a sample point that differs due to a slight time shift then the two files may sound different with PC ABX. (Example: there will be a perceptible click on one file and a different sounding click on the other file.)

For example, I created two files at 96/24 consisting of 10 seconds of a =12 dBfs 1000 Hz tone, slowly fading in and out (0.1 second fade with SoundForge). Then I added 1/4 cycle silence to the beginning of one file and to the end of the other (24 samples). Playing the two files all the way through they sounded identical using PC ABX. However, if I selected an arbitrary interval in the middle of the file, so as to avoid the fade-in, then after two or three tries I found a setting where the same interval produced an obviously different starting click and so was able to ABX 100% reliably.

Using the same technique with the jangling key files, I found start points where it was trivial to ABX the two files, even though I had trouble hearing a difference when playing them all the way through.

As this is my first post, I will refrain from making any further comments that might be considered contentious or personal or which might violate the terms of service of this forum. (I was going to post this on the AVS forum, but I see that the thread was locked.)

Also a welcome Mr. Lauck.

I found the same thing myself. I also found that using Audacity there are no timing issues. I don't know about others, but current Audacity, Sox and Sox plug in for Foobar don't have this issue upon resampling. Once Arny's file was resampled this way it was no longer detectable by me. Audition may or may not cause a timing shift don't have it handy to try out myself.

arnyk · Aug 20, 2014

jkeny said:
So, ArnyK's tracks had level difference (0.2dB, I believe?) but only about half a sample - 13uS mismatched, when I tested them.

I question both results. The level difference between my two files is huge, about 2 dB - because of the extreme ultrasonic content of the 24/96 file that obviously had to disappear in the downsampled version.

The idea that a resampled file needs to have the same level as the original file is an error. They should be reasonably well matched if both are measured for content < 20 KHz.

One of the basic fundamental ideas of audio measurements is that level measurements of wideband and/or incoherent signals are only valid if a relevant measurement bandwidth is stated, and they are only comparable if the measurement bandwidths are the same.

The approximate one sample difference due to the resampling is a nit. People will hear a mismatch that measures in the milliseconds, but not one that measures in the microseconds. The delay is due to prophylactic brick wall filtering in order to minimize the possibility of IM.

arnyk · Aug 20, 2014

Tony Lauck said:
Any tests down with PC ABX/Foobar2000 are potentially invalid. The problem is that if the two files have a small sample shift or some other reason that causes there to be a different DC component at a sample point that differs due to a slight time shift then the two files may sound different with PC ABX. (Example: there will be a perceptible click on one file and a different sounding click on the other file.)

For example, I created two files at 96/24 consisting of 10 seconds of a =12 dBfs 1000 Hz tone, slowly fading in and out (0.1 second fade with SoundForge). Then I added 1/4 cycle silence to the beginning of one file and to the end of the other (24 samples). Playing the two files all the way through they sounded identical using PC ABX. However, if I selected an arbitrary interval in the middle of the file, so as to avoid the fade-in, then after two or three tries I found a setting where the same interval produced an obviously different starting click and so was able to ABX 100% reliably.

Using the same technique with the jangling key files, I found start points where it was trivial to ABX the two files, even though I had trouble hearing a difference when playing them all the way through.

As this is my first post, I will refrain from making any further comments that might be considered contentious or personal or which might violate the terms of service of this forum. (I was going to post this on the AVS forum, but I see that the thread was locked.)

In its way it is a good post. However the basic thesis:

"
Any tests down with PC ABX/Foobar2000 are potentially invalid. The problem is that if the two files have a small sample shift or some other reason that causes there to be a different DC component at a sample point that differs due to a slight time shift then the two files may sound different with PC ABX. (Example: there will be a perceptible click on one file and a different sounding click on the other file.)
"

Utterly lacks much needed generality.

The problem is not related to just FooBar2000.

The problem is not related to just ABX tests.

The far more general and useful statement is that: Any listening test that is more sensitive to audible effects is more prone to distract by uncovering what may be a trivial effect that is unrelated to the investigation at hand.

In a way we may have uncovered the audio equivalent of the Uncertainty Principle: The smaller the difference of a kind that you try to hear, the more likely you are to hear a difference that is actually something else.

Please footnote when you put it into your next Thesis Project. ;-)

jkeny · Aug 20, 2014

arnyk said:
I question both results. The level difference between my two files is huge, about 2 dB - because of the extreme ultrasonic content of the 24/96 file that obviously had to disappear in the downsampled version.

The idea that a resampled file needs to have the same level as the original file is an error. They should be reasonably well matched if both are measured for content < 20 KHz.

One of the basic fundamental ideas of audio measurements is that level measurements of wideband and/or incoherent signals are only valid if a relevant measurement bandwidth is stated, and they are only comparable if the measurement bandwidths are the same.

If you followed the conversation on this thread you would realise that the level difference being talked about is in the audio band - otherwise it would make no audible difference & therefore not invalidate the ABX test i.e the listener being able to use this tell to identify X. As J_J said on AVS "0.2 dB should be detectable, but not as a level difference (it comes across as a difference in quality, or other sensation than level for nearly everyone, well, everyone I know of, at least), rather as just a difference, if the listening room is quiet, the equipment is good, the listener has reasonably normal hearing, and the listener has rather a lot of practice." Don't know if some similar psychoacoustic effect applies to the timing difference below?

The approximate one sample difference due to the resampling is a nit. People will hear a mismatch that measures in the milliseconds, but one that measures in the microseconds. The delay is due to prophylactic brick wall filtering in order to minimize the possibility of IM.

Maybe it's not audible, maybe it is? Some here are stating that there was/is no need for any timing difference if using a modern resampler, anyway.

amirm · Aug 20, 2014

Tony Lauck said:
Any tests down with PC ABX/Foobar2000 are potentially invalid. The problem is that if the two files have a small sample shift or some other reason that causes there to be a different DC component at a sample point that differs due to a slight time shift then the two files may sound different with PC ABX. (Example: there will be a perceptible click on one file and a different sounding click on the other file.)

For example, I created two files at 96/24 consisting of 10 seconds of a =12 dBfs 1000 Hz tone, slowly fading in and out (0.1 second fade with SoundForge). Then I added 1/4 cycle silence to the beginning of one file and to the end of the other (24 samples). Playing the two files all the way through they sounded identical using PC ABX. However, if I selected an arbitrary interval in the middle of the file, so as to avoid the fade-in, then after two or three tries I found a setting where the same interval produced an obviously different starting click and so was able to ABX 100% reliably.

Using the same technique with the jangling key files, I found start points where it was trivial to ABX the two files, even though I had trouble hearing a difference when playing them all the way through.

As this is my first post, I will refrain from making any further comments that might be considered contentious or personal or which might violate the terms of service of this forum. (I was going to post this on the AVS forum, but I see that the thread was locked.)

All disagreements are welcome Tony. We simply draw the line when it gets personal rather than discussing the technical topic.

If you noticed, I put proof in quotation mark for this thread. The "standard" of proof in countless forum arguments has been an ABX test. And such was suggested, and specifically foobar2000 ABX as the method of evaluating these files. It was thought until now that such ABX tests could not generate positive results. My testing and later that of others showed that positive outcomes are possible.

What they require is critical listening ability and motivation that these differences can be found. Negative outcome is assured if one thinks this is an impossible test. And not paying attention to smallest details, which in your case was the pop/glitch. All of this actually breaks new ground in these forum discussions. It shows that there are critical listeners which do much better than others in these tests. Even after the information you talk about was disclosed by others, vast majority of people still can't tell these files apart. So we have pretty strong proof that real "golden ears" exist.

Why is that important? Because expert listeners are a requirement for any listening test that is searching for small differences. But up to now, it was assumed that there is no such class of listeners. Therefore we could take the outcome of any blind tests, regardless of the skill of listeners, and apply it to the audiophile population at large. I don't remember reading about any of the DIY ABX tests that had expert listeners. Or training.

This is the most important lesson here. It should lead to more careful tests and more reliable results.

arnyk · Aug 20, 2014

amirm said:
All disagreements are welcome Tony. We simply draw the line when it gets personal rather than discussing the technical topic.

If you noticed, I put proof in quotation mark for this thread. The "standard" of proof in countless forum arguments has been an ABX test. And such was suggested, and specifically foobar2000 ABX as the method of evaluating these files. It was thought until now that such ABX tests could not generate positive results. My testing and later that of others showed that positive outcomes are possible.

Positive outcomes can be meaningless as I showed by obtaining similar positive results which I repudiate on the grounds that I gamed the system.

Positive outcomes are questionable unless duplicated by a number of other experimenters which AFAIK hasn't happened.

The purpose of the evaluation is determining what a statistically significant percentage of listeners can do, not just one or two.

Positive outcomes are at best questionable unless performed under the direct observation by a qualified but neutral observer which AFAIK also hasn't happened. If JJ had been observing...

amirm · Aug 20, 2014

arnyk said:
Positive outcomes can be meaningless as I showed by obtaining similar positive results which I repudiate on the grounds that I gamed the system.

I did not game the system however. You said you found the difference due to intermodulation distortion which my system does not produce.

Positive outcomes are questionable unless duplicated by a number of other experimenters which AFAIK hasn't happened.

I myself find negative outcomes more questionable than positive. When there is a positive as we have had here, then we have something to chase and understand. Recall the three or more versions of your test files we ran. If the outcome had been negative, you wouldn't know if there was a problem with the test or the outcome really should have been negative.

This is especially problematic because none of the DIY double blind tests follow best practices such as having controls, inclusion of expert listeners, etc.

The purpose of the evaluation is determining what a statistically significant percentage of listeners can do, not just one or two.

Depends on the purpose of the test. If you are trying to say what a significant percentage of listeners can hear, then yes, you need to have a good size sample that represents them.

The claim however has been different. That such differences are below the Just Noticeable Difference (JND) of human beings in general. You routinely say that and go on to say these differences are orders of magnitude below JND. Invalidating that position then only takes one listener who can hear the difference to statistical significance. After all, the JND claim better cover people like me or else it is not JND!

Positive outcomes are at best questionable unless performed under the direct observation by a qualified but neutral observer which AFAIK also hasn't happened. If JJ had been observing...

Again, same is true of negative outcomes. Unfortunately DIY tests are notorious for having no such "qualified" supervision. Witness Meyer and Moran test that ignored a number of best practices including using material which did not even have ultrasonic content!

Witness how you yourself are learning the potential flaws with these tests just now. So having someone like you supervise the tests would not have meant anything either.

JackD201 · Aug 20, 2014

arnyk said:
Positive outcomes can be meaningless as I showed by obtaining similar positive results which I repudiate on the grounds that I gamed the system.

Positive outcomes are questionable unless duplicated by a number of other experimenters which AFAIK hasn't happened.

The purpose of the evaluation is determining what a statistically significant percentage of listeners can do, not just one or two.

Positive outcomes are at best questionable unless performed under the direct observation by a qualified but neutral observer which AFAIK also hasn't happened. If JJ had been observing...

You gamed the system? To what end?

Phelonious Ponk · Aug 21, 2014

arnyk said:
Positive outcomes can be meaningless as I showed by obtaining similar positive results which I repudiate on the grounds that I gamed the system.

Positive outcomes are questionable unless duplicated by a number of other experimenters which AFAIK hasn't happened.

The purpose of the evaluation is determining what a statistically significant percentage of listeners can do, not just one or two.

Positive outcomes are at best questionable unless performed under the direct observation by a qualified but neutral observer which AFAIK also hasn't happened. If JJ had been observing...

You should go ahead and remove "positive" from all of those points, except for the one about the purpose of the evaluation. And I'm not sure that one is accurate. Maybe t should be; after all, the ultimate objective here is to enjoy listening to music, not to train ourselves to hear flaws in its reproduction. But the claim, in the wake of Meyer and Moran, and many informal ABX tests, was that there was no audible difference, not that there was not a statistically significant sample that could hear the difference. That's what Meyer and Moran indicated. And that sample did include trained, expert listeners; it included audiophiles, recording engineers, recording engineering students. It included self-proclaimed experts, trained and experienced experts, and experts in training. And in spite of some procedural flaws, it was conducted with more rigor, controls and statistical validity than any other study of the subject I'm aware of.

It doesn't rise to the standard of proof, but what it indicates, to statistical significance, is that when listening to a variety of high quality systems under a variety of conditions, a broad variety of listeners could not hear a difference between hi res and RB files.

What Amir's result indicates, is that when listening for very specific things during the playback of very specific passages, under very specific conditions, differences that may or may not be music, can be heard.

I'm happy to leave it to individuals to rationalze that information to fit whatever it is they wish to believe.

Tim

jkeny · Aug 21, 2014

Phelonious Ponk said:
What Amir's result indicates, is that when listening for very specific things during the playback of very specific passages, under very specific conditions, differences that may or may not be music, can be heard.

I'm happy to leave it to individuals to rationalze that information to fit whatever it is they wish to believe.

Tim

Good post, Tim but I'm not sure you summation of Amir's results is 100% correct - I'm sure he will answer this himself but I believe you forget his description of the procedure he went through. Prior to actually doing the ABX test itself he played both tracks & was able to identify them 100% of the time. He focussed in on specific parts of the tracks that he felt were significantly different sounding to be able to deal with the psychological issues of the ABX testing itself. So, if he had not been required to produce the ABX statistics for posting then his test could have stopped at being able to differentiate 100% correctly between the tracks. In other words, in normal listening he could tell a difference & it wasn't because of IMD.

arnyk · Aug 21, 2014

JackD201 said:
You gamed the system? To what end?

To show that the test due its very nature was exceptionally susceptible to producing positive results that did not support the thesis being tested.

In terms of doing that, I only scratched the surface.

Stepping back a few feet I think that the outcome of the tests has been very positive for the credibility of improved listening test technology.

Blind tests are no longer producing only negative results.

A high level of transparency has been provided.

Very small differences are being heard.

arnyk · Aug 21, 2014

jkeny said:
If you followed the conversation on this thread you would realize that the level difference being talked about is in the audio band

I just reread all 90 pages of the thread and in fact find no such information. Furthermore I found the exact opposite being stated by a poster named Esldude.

I found a goodly number of other errors of fact, so some people around here received false benefits from my absence.

Please provide the post number in which this alleged discrepancy was found.

Phelonious Ponk · Aug 21, 2014

jkeny said:
Good post, Tim but I'm not sure you summation of Amir's results is 100% correct - I'm sure he will answer this himself but I believe you forget his description of the procedure he went through. Prior to actually doing the ABX test itself he played both tracks & was able to identify them 100% of the time. He focussed in on specific parts of the tracks that he felt were significantly different sounding to be able to deal with the psychological issues of the ABX testing itself. So, if he had not been required to produce the ABX statistics for posting then his test could have stopped at being able to differentiate 100% correctly between the tracks. In other words, in normal listening he could tell a difference & it wasn't because of IMD.

I just re-read Amir's posts on the first page of this thread, where he describes his methodology, and saw no reference to this. I did, however, find this:

Track 1 and 3 were easy to tell apart but both took fair amount of effort to find the critical segments were the difference could be heard. The middle track #2 was essentially not distinguishable but I managed to finally find the difference.

And this:

Also note the speed with which I was able to identify the right tracks. On the average it was just 10 seconds which included listening to both "X" and "Y" tracks, voting, and the telling foobar to go to the next comparison. To do that requires remembering exactly what the difference is and having no need to listen to A or B reference samples again.

In his initial comments, at least, Amir neither describes the procedure you outlined above or makes it sound particularl stressful. It does, however, sound as if it took some effort -- the differences to listen for had to be identified, and critical segments highlighting those differences had to be found to differentiate the samples. I'm pretty sure that's what it says, anyway. Perhaps Amir can elaborate. I'm also sure something like what you said above is in there somewhere, but even in my new-found retirement I don't have time to re-read this whole thread.

Tim

Conclusive "Proof" that higher resolution audio sounds different

New Member

New Member

Industry Expert, Member Sponsor

New Member

Member Sponsor

Super Moderator

Site Founder, Site Owner, Administrator

New Member

New Member

New Member

Industry Expert, Member Sponsor

Banned

New Member

Banned

WBF Founding Member

New Member

Industry Expert, Member Sponsor

New Member

New Member

New Member

Similar threads