Do blind tests really prove small differences don't exist?

andy_c · May 16, 2011

sasully said:
Leventhal's work ikn this area is mainly a reminder that Type 2 errors exist (not just Type 1), and should be factored into the statistics, and that statistical power of a test is important too.

Neither of these suggest that DBT is intrinsically incapable of 'detecting' small difference. It's really just a call to do them right -- and not to let conclusions exceed the message of the data.

Agree completely.

I was somewhat hesitant to link to that Stereophile discussion, and did so only because it was the only way to get more detail about Leventhal's article without having to actually buy the article. It seems to me that people on both sides of the issue drew more from Leventhal's arguments than Leventhal himself intended.

JackD201 · May 16, 2011

andy_c said:
It is evidence of measurable differences, but I assume we're talking audible differences. One could say the measurements are evidence of potential audible differences though.

Precisely Andy. The two are not interchangeable. Sadly enough we see this all too often. One answers what is, one answers how much it might matter.

How about a curveball? In audio we try to look for patterns in a piece of equipment's performance. Tendencies or fingerprints if you will that will somehow help us predict outcomes. To do this we have to listen over a wide range of recorded material. This takes time. This is one area where small differences are not well served by any test that does quick switching.

arnyk · May 19, 2011

JackD201 said:
Precisely Andy. The two are not interchangeable. Sadly enough we see this all too often. One answers what is, one answers how much it might matter.

How about a curveball? In audio we try to look for patterns in a piece of equipment's performance. Tendencies or fingerprints if you will that will somehow help us predict outcomes. To do this we have to listen over a wide range of recorded material. This takes time. This is one area where small differences are not well served by any test that does quick switching.

No curveball at all. We've been dealing with the fact that only certain portions of certain recordings make certain audible defects most clear for over 30 years. If you check the literature of successful DBTs you will find lists of recordings and portions of recordings that various workers found to be most effective.

Your comments about quick switching raise two questions:

Quick switching means being able to switch quickly whenever you want with ease and thus instantly hear what you want to hear. The opposite of quick switching is slow switching which means that listening to the alternative that you choose is slow, laborious, and frustrating. Do you actually prefer frustration as opposed to ease?

Phelonious Ponk · May 19, 2011

JackD201 said:
Precisely Andy. The two are not interchangeable. Sadly enough we see this all too often. One answers what is, one answers how much it might matter.

How about a curveball? In audio we try to look for patterns in a piece of equipment's performance. Tendencies or fingerprints if you will that will somehow help us predict outcomes. To do this we have to listen over a wide range of recorded material. This takes time. This is one area where small differences are not well served by any test that does quick switching.

A really interesting test would be to have a group of audiophiles listen to a component for a few weeks, over a wide range of recorded material, etc., then bring them in and test that same component blind.

Tim

arnyk · May 19, 2011

Phelonious Ponk said:
A really interesting test would be to have a group of audiophiles listen to a component for a few weeks, over a wide range of recorded material, etc., then bring them in and test that same component blind.

This has been done many, many times. The outcome of the test depends on the question that was in miind when the test was set up.

JackD201 · May 19, 2011

arnyk said:
No curveball at all. We've been dealing with the fact that only certain portions of certain recordings make certain audible defects most clear for over 30 years. If you check the literature of successful DBTs you will find lists of recordings and portions of recordings that various workers found to be most effective.

Your comments about quick switching raise two questions:

Quick switching means being able to switch quickly whenever you want with ease and thus instantly hear what you want to hear. The opposite of quick switching is slow switching which means that listening to the alternative that you choose is slow, laborious, and frustrating. Do you actually prefer frustration as opposed to ease?

Perhaps I should rephrase. I did mention small differences. By small I'm taking this subject literally. An example would be settings for a phono cartridge. I have 130g, 150g, 180g and 200g LPs. I can adjust VTA for each but I would rather not. It sucks the fun out of just listening. So I have to find a happy compromise. At a certain VTA setting, what am I giving up, gaining or losing for each LP thickness? I know the difference is there. My VTA post markers show me. I can see the difference is SRA with a magnifying glass. I can measure changes in tracking weight to a hundredth of a gram. Now with whatever thickness, the cartridge is still well within it's operating range if set ever so slightly tail up on a 150g record so I am not getting any tell tale gross FR skews or tracking distortions. The slight change in FR curve however, leads to emphasis on different sonic events. Tracking tail up highlights strings, tail down bass guitars etc. etc. How in this case does blind testing my settings help me make up my mind? Changing LPs is quick enough. Definitely quicker than changing loudspeakers on Harman's rotating stage. Changing VTA on a graham is even quicker than changing LPs, heck you can do it on the fly. To appreciate these small changes, I have to run through familiar recordings on every thickness. Just like a car, I have to drive it quite a bit. Five minute test drives don't cut it.

arnyk · May 19, 2011

JackD201 said:
Perhaps I should rephrase. I did mention small differences. By small I'm taking this subject literally. An example would be settings for a phono cartridge. I have 130g, 150g, 180g and 200g LPs. I can adjust VTA for each but I would rather not. It sucks the fun out of just listening. So I have to find a happy compromise. At a certain VTA setting, what am I giving up, gaining or losing for each LP thickness? I know the difference is there. My VTA post markers show me. I can see the difference is SRA with a magnifying glass. I can measure changes in tracking weight to a hundredth of a gram. Now with whatever thickness, the cartridge is still well within it's operating range if set ever so slightly tail up on a 150g record so I am not getting any tell tale gross FR skews or tracking distortions. The slight change in FR curve however, leads to emphasis on different sonic events. Tracking tail up highlights strings, tail down bass guitars etc. etc. How in this case does blind testing my settings help me make up my mind? Changing LPs is quick enough. Definitely quicker than changing loudspeakers on Harman's rotating stage. Changing VTA on a graham is even quicker than changing LPs, heck you can do it on the fly. To appreciate these small changes, I have to run through familiar recordings on every thickness. Just like a car, I have to drive it quite a bit. Five minute test drives don't cut it.

So far I see nothing that prevents quick switching from being the tool of choice. It seems to me that you have mixed up quick switching with the use of short snippets of music.

Here is how I would address your problem, and I can see how it might be totally unacceptable to you.

I would make a family of needle drops, each related to a certain VTA within the range of interest.

I would then audition the needle drops in parallel, exploiting the benefits of quick switching and the ability to reliably backspace and forward space in time, and listen again and again to the recordings and critical sections of them to form my final conclusion.

I might even use a test LP as my source, and do technical observations of the reproduction of various artificial signals to further substantiate my conclusions.

Phelonious Ponk · May 19, 2011

arnyk said:
This has been done many, many times. The outcome of the test depends on the question that was in miind when the test was set up.

I would expect that if you began with a properly designed test carried to enough trials to exceed the statistical margin for error, that anything determined to be inaudible would still be inaudible to listeners after a much longer period of listening and acclimation. But my expectations are not always met, and I'm not a statistician.

I would also expect that the same discipline would address Amir's issues regarding erroneous results. Erroneous results are common. Well-designed tests greatly reduce the impact of error on the conclusions.

Tim

JackD201 · May 19, 2011

arnyk said:
So far I see nothing that prevents quick switching from being the tool of choice. It seems to me that you have mixed up quick switching with the use of short snippets of music.

Here is how I would address your problem, and I can see how it might be totally unacceptable to you.

I would make a family of needle drops, each related to a certain VTA within the range of interest.

I would then audition the needle drops in parallel, exploiting the benefits of quick switching and the ability to reliably backspace and forward space in time, and listen again and again to the recordings and critical sections of them to form my final conclusion.

I might even use a test LP as my source, and do technical observations of the reproduction of various artificial signals to further substantiate my conclusions.

With all due respect Arny, substantiate what? I have absolutely no idea what conclusions you are referring to. The first post of yours I ever read was the one addressed to me.

I have absolutely no qualms with the way you would approach my problem. My way works for me, if yours works for you then it's all good as far as I'm concerned.

arnyk · May 19, 2011

Phelonious Ponk said:
I would expect that if you began with a properly designed test carried to enough trials to exceed the statistical margin for error, that anything determined to be inaudible would still be inaudible to listeners after a much longer period of listening and acclimation. But my expectations are not always met, and I'm not a statistician.

I would also expect that the same discipline would address Amir's issues regarding erroneous results. Erroneous results are common. Well-designed tests greatly reduce the impact of error on the conclusions.

That is what seems to happen. The biggest influences on the sensitivity of results are listener selection/training and choice of musical program material.

On the topic of how to independently validate ABX, one means to validate ABX is to try to use ABX to test for the thresholds of audibility that are generally accepted, based on other kinds of tests. IME ABX generally does a little better but not enough better that the improved results are outside of the range of probable variations for the original results.

Phelonious Ponk · May 19, 2011

arnyk said:
That is what seems to happen. The biggest influences on the sensitivity of results are listener selection/training and choice of musical program material.

On the topic of how to independently validate ABX, one means to validate ABX is to try to use ABX to test for the thresholds of audibility that are generally accepted, based on other kinds of tests. IME ABX generally does a little better but not enough better that the improved results are outside of the range of probable variations for the original results.

Stick around, Arny.

Tim

arnyk · May 19, 2011

JackD201 said:
With all due respect Arny, substantiate what? I have absolutely no idea what conclusions you are referring to.

In this context, the conclusion I am thinking of is which VTA pleases you the most.

JackD201 · May 19, 2011

In that case, well yes I suppose it could work too. My main problem with it is that my musical enjoyment is not in sonics per se but rather within the musical context. You're right then, to me quick switching is synonymous with short snippets of music and this doesn't work well at all for me. When I get too "perfectionist" for short snippets, I can make that particular one sound pretty much the way I want it too. The problem is that that doesn't translate. It's a bigger problem when you're a genre hopper like I am. I only use test tones for calibration. The good thing is I enjoy the process of getting the sound I want, the bad thing is I don't enjoy the process to such an extent that I'll do it for every single track. It defeats my primary purpose which is really just to kick back and relax.

Phelonious Ponk · May 20, 2011

JackD201 said:
In that case, well yes I suppose it could work too. My main problem with it is that my musical enjoyment is not in sonics per se but rather within the musical context. You're right then, to me quick switching is synonymous with short snippets of music and this doesn't work well at all for me. When I get too "perfectionist" for short snippets, I can make that particular one sound pretty much the way I want it too. The problem is that that doesn't translate. It's a bigger problem when you're a genre hopper like I am. I only use test tones for calibration. The good thing is I enjoy the process of getting the sound I want, the bad thing is I don't enjoy the process to such an extent that I'll do it for every single track. It defeats my primary purpose which is really just to kick back and relax.

I suspect short snippets are best for determining whether or not a difference is audible, longer segments are better for finding preference, and days/weeks of listening sighted are best for talking ourselves into hearing what we think we should hear.

Tim

arnyk · May 20, 2011

JackD201 said:
You're right then, to me quick switching is synonymous with short snippets of music and this doesn't work well at all for me.

I agree that listening to short snippets is very unnatural until you get used to it. It is probably very helpful to follow an audible difference from its original context into a snippet that highlights it best. Until you've experienced the process of finding songs that best illustrate a difference from among many songs, and then find the portion of the song that again does the best job of illustrating the difference, there's no reason to have any faith in this process.

IME you have to figure out what you want to do. Do you want to listen to music for enjoyment, or do you want to hear differences.

JackD201 · May 20, 2011

I was not aware that that was mutually exclusive.

Phelonious Ponk · May 20, 2011

I was not aware that that was mutually exclusive.

I think it has just been illustrated:

I agree that listening to short snippets is very unnatural until you get used to it. It is probably very helpful to follow an audible difference from its original context into a snippet that highlights it best. Until you've experienced the process of finding songs that best illustrate a difference from among many songs, and then find the portion of the song that again does the best job of illustrating the difference, there's no reason to have any faith in this process.

And Arny mercifully skipped the parts where you train yourself to hear things like jitter artifacts, then listen to the aforementioned carefully chosen snippets on very revealing high-end headphone systems at abnormally high volumes to find them. By the time you get through all of that, you're hardly listening to music at all.

Tim

arnyk · May 20, 2011

JackD201 said:
I was not aware that that was mutually exclusive.

Most definitely so.

I suspect that many have thought that their musical enjoyment was hindered by subtle technical problems with their audio systems. Therefore their enjoyment of their system can be restored or enhanced by removing or reducing these subtle technical problems.

Actual experience with small differences tells a somewhat different story. the majority if not all of these small differences are always completely overlooked by the normal process of listening for enjoyment. They can only be noticed if one does very quick comparisons involving that tiny minority of all music that actually makes the underlying technical situation audible.

It takes relatively large differences to make listening actually be less enjoyable unless of course you know by some other means than listening that they are there. Of course with things like loudspeakers and rooms in the signal chain, these kinds of differences can abound.

arnyk · May 20, 2011

Phelonious Ponk said:
...Arny mercifully skipped the parts where you train yourself to hear things like jitter artifacts, then listen to the aforementioned carefully chosen snippets on very revealing high-end headphone systems at abnormally high volumes to find them. By the time you get through all of that, you're hardly listening to music at all.

Since you mentioned jitter... While modern audiophiles are turning their guts inside out worrying about AVRs with say 900 picoseconds of jitter, we need to remind ourselves that the very finest studio grade analog tape machines ever made had no less than 1 million picoseconds of jitter. Makes you wonder why people aren't running out of the room with blood spurting from their ears every time they listen to a SACD or DVD-A transcription of something that was originally recorded on analog tape, no? ;-)

Phelonious Ponk · May 20, 2011

arnyk said:
Since you mentioned jitter... While modern audiophiles are turning their guts inside out worrying about AVRs with say 900 picoseconds of jitter, we need to remind ourselves that the very finest studio grade analog tape machines ever made had no less than 1 million picoseconds of jitter. Makes you wonder why people aren't running out of the room with blood spurting from their ears every time they listen to a SACD or DVD-A transcription of something that was originally recorded on analog tape, no? ;-)

I assume you are using jitter in a broad sense to address timing errors common to analog but typically not referred to as jitter?

Tim

Do blind tests really prove small differences don't exist?

Well-Known Member

WBF Founding Member

New Member

New Member

New Member

WBF Founding Member

New Member

New Member

WBF Founding Member

New Member

New Member

New Member

WBF Founding Member

New Member

New Member

WBF Founding Member

New Member

New Member

New Member

New Member

Similar threads