Double Blind Testing and the threshold of necessity

Gregadd

WBF Founding Member
Apr 20, 2010
10,553
1,786
1,850
Metro DC
I was involved in two listening tests where we compared digital active crossovers. We compared them working purely as digital converters, and then using the same filters. We followed my suggestion to do instant switching, but only managed to get it working on the second event. It was an informal series of tests with no intention to convince anyone else - more a matter of satisfying our own curiosity. Later we decided to share the result online, and I posted in a few different places. We were criticised for not doing it blind - I should have expected that. The group as a whole wasn't strongly biased towards any significant outcome - we suspected that differences would be very minor or not there at all. That is mostly what we found, except where measurements showed we had a different response. My contention was that the instant switching method was adequate to overcome any bias issues. We could switch in the middle of a sustained note and listen only for differences at the switch. No audio memory. We switched so frequently that it was not really possible to be biased because you quickly lose track unless you are deliberately trying to cheat. Yet some are never satisfied unless it's done double blind ABX style.

I would much rather a sighted instant switching test with 10 enthusiasts than a double blind ABX test with 100 people that relies on audio memory. I contend that even when tested blind, if one has to remember what they heard then compare it to what they hear right now, audio memory invalidates the test when the differences are subtle. Our listening system just isn't well designed for that kind of comparison. The same is true of our sight. Take two colour samples that are just slightly different and hold them apart. Can you tell which one is darker? Could 100 people tell? Even if they do no better than 50/50, any one person could pick the difference if you put them right next to each other with nothing in between. If this is true with sight, which we by nature tend to trust more than our ears, then how much more true is it of audio?

Where is the threshold? When do you need to test it blind? Well, perhaps the answer is also related to your goal. Do you want to satisfy your own curiosity only? Then I'd do an instant switch test and do it long enough to form an opinion. Do you want to try to prove a point? It may be that amplifiers that measure within certain conditions will sound the same when not clipping. Or do cables sound different? That suggests blind test. However, IMHO bias might not always be the major concern with a test. If you are doing your own small test and your bias is "there won't be a difference" then if anything that bias may cause you to miss very small differences. However, bigger concerns are spurious factors that can influence the result. Levels not matched. Frequency response differences that alone could account for differences perceived.

I agree almost 100%.



Just for fun I have tried some of the short DBT tests. I could not do anything with the 10-15 second music clips. They were too short for me to memorize or form an opinion about the quality of the sound. They way we learn has some bearing here.The memory tends to prioritize the things we use the most. Memorizing a clip of music and the way it sounds is different from memorizing a girls phone number. On top of that is ,if DBT is reserved only for small differences, that makes identifying and memorizing the differences all the more difficult. That's why we have to study. We expose ourselves to material repeatedly and the brain tends to give it priority. We will always remember what blue is. If we don't expose ourselves to the finer differences in shades we might forget them. Added to that for some blue is blue and they could care less about different shadings.
 

microstrip

VIP/Donor
May 30, 2010
20,807
4,700
2,790
Portugal
I can't find the quote right now but Ethan responded to several posts saying if you can't measure it you can't hear it. Moreover Ethan has repeatedly postulated even if you can measure it does not mean it is audible. The fact that Ethan let a sighted listening test trump measurements to me represents an epiphany.

Greg,
Just using a blindfold does not make a DBT. As far as I have read in this forum Ethan never asked for or really carried a proper DBT (please correct me if you find a description of it in his postings ), just unsighted challenges where the probability of his victory are statistically high, even he the effect is easily noticed. BTW, even if someone is lucky in his 5 over 5 challenges it does not prove anything.
 

Paul Spencer

Well-Known Member
Oct 4, 2010
48
0
296
As you say, blind tests aren't for comparing red to blue, or even cyan to blue - they are for comparing tiny variations of blue. In my blog I put together a visual illustration ... this relates to the idea many use about "trusting your ears."



Obviously the two sides of the donut look different. Do you trust your eyes? We know they can be tricked because we've already been shown many times how they can be tricked. We know at times not to trust the information our eyes give us.



This is the same image with the divider removed - no other change. So we've all learnt sometimes we can't trust our eyes alone - we rely on some understanding as well.

With audio, we have not had the benefit of this. It's perhaps more difficult to demonstrate how we can be tricked or how we can even fail to pick something. The audio medium just doesn't lend itself to revealing this kind of thing in the same way. So we go on unaware of some of our limits.

The point of all of this is to show that instant switching is needed. Those short samples are just the thing, because it's at the moment of switching that you get a more accurate comprison. That moment is the one in which the red line is removed and the contrast is seen, or lack thereof.

The full post is here:
http://redspade-audio.blogspot.com/2010/08/basic-measurement-setup.html

Blind tests are a great idea for someone else to do. Firstly, they are tedious. Secondly, they are about trying to find out if minor differences can be heard. Often they are the kind of differences that fly under the radar.
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,553
1,786
1,850
Metro DC
There is a vast difference between trustworthy and infallible. Even a precision machine can be tricked.

Ethan's position on DBT is well documented on tis forum. Notwithstanding the fact that it seems to be not for him and reserved for those who disagree with him. And its' a modified DBT at that.

I thought it was noteworhty that he made choice of what sounds good over what the graphs showed. He seemed to argue that was Okay because the the differences so were great. IMO that seemed to be a step in the right direction.
Silly me to think that any DBT discussion would be simple.


PS- Ethan posted a very interesting video of how the ears and eyes working together can also be fooled.
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,553
1,786
1,850
Metro DC
Greg,
Just using a blindfold does not make a DBT. As far as I have read in this forum Ethan never asked for or really carried a proper DBT (please correct me if you find a description of it in his postings ), just unsighted challenges where the probability of his victory are statistically high, even he the effect is easily noticed. BTW, even if someone is lucky in his 5 over 5 challenges it does not prove anything.

Agreed.
 

Ethan Winer

Banned
Jul 8, 2010
1,231
3
0
75
New Milford, CT
Just using a blindfold does not make a DBT. As far as I have read in this forum Ethan never asked for or really carried a proper DBT

Right, I have never done a full DBT. But I've given lots of single-blind tests, and I've tested myself blind enough times to be confident in my conclusions.

--Ethan
 

JackD201

WBF Founding Member
Apr 20, 2010
12,318
1,427
1,820
Manila, Philippines
Paul,

The images tell us that the eye can be fooled. Knowing why doesn't stop our eyes from being fooled the second time around, if even for a couple of seconds, if you used the same contrast experiment with different shapes or other complementary colors. The same goes for sound, touch, taste or smell. Take taste. Sweetness and Saltiness complement each other and Saltiness and Sourness diminish each other. A blind test is not going to tell you how much sugar and sodium is in Pepsi or Coke, it can only tell you what people find sweeter......on average at that.

The original question of this thread is a TRICK QUESTION. It's actually a bit funny. The purpose of any unsighted test as applied to audio is exactly to find out what the threshold is for the respondents. Those administering the tests already know the differences or as in your example, the exactness. To put it bluntly blind tests are designed to remove bias IN ORDER TO find out the minimum requirements, what you can get away with. Its other purpose is to make sure that the designer himself isn't just imagining things, now here is where trained respondents REALLY become important. Your panel has got to be at par with the designers! Double blinds are for the latter scenario being more stringent, in that even observation bias is removed. It still has a third party to administer it.

To be frank, what gives blind tests a bad rap is people reading more into them than they should.

Peace

Jack
 

JackD201

WBF Founding Member
Apr 20, 2010
12,318
1,427
1,820
Manila, Philippines
Thanks Ron and yeah, I think my heads about to burst from trying to keep up with and digest all those really cool threads. I'm loving it. ;) ;) ;)
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,553
1,786
1,850
Metro DC
As you say, blind tests aren't for comparing red to blue, or even cyan to blue - they are for comparing tiny variations of blue. In my blog I put together a visual illustration ... this relates to the idea many use about "trusting your ears."



Obviously the two sides of the donut look different. Do you trust your eyes? We know they can be tricked because we've already been shown many times how they can be tricked. We know at times not to trust the information our eyes give us.



This is the same image with the divider removed - no other change. So we've all learnt sometimes we can't trust our eyes alone - we rely on some understanding as well.

With audio, we have not had the benefit of this. It's perhaps more difficult to demonstrate how we can be tricked or how we can even fail to pick something. The audio medium just doesn't lend itself to revealing this kind of thing in the same way. So we go on unaware of some of our limits.

The point of all of this is to show that instant switching is needed. Those short samples are just the thing, because it's at the moment of switching that you get a more accurate comprison. That moment is the one in which the red line is removed and the contrast is seen, or lack thereof.

The full post is here:
http://redspade-audio.blogspot.com/2010/08/basic-measurement-setup.html

Blind tests are a great idea for someone else to do. Firstly, they are tedious. Secondly, they are about trying to find out if minor differences can be heard. Often they are the kind of differences that fly under the radar.


Paul for me at least we see the another side of testing. The need to see the Emperors new clothes and appreiciate thier Beauty. Try as I might I don't see the difference. I feel like the old man in "Moonstruck" I amn so confused"
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,553
1,786
1,850
Metro DC
Originally Posted by Gregadd
Ethan here is what you said-

The conventional wisdom is you need to be at least one foot away for each inch of diffusor depth. So with a 6-inch deep diffusor you're supposed to be at least six feet away. However, my own listening tests conclude otherwise. I've sat directly in front of the 6-inch deep diffusors my company sells, and the sound was vastly better than a bare wall. Whether you will prefer that to having 100 percent absorption behind your head is hard to say, but I imagine you would. However, this applies only for genuine QRD type diffusors, not random stuff like bookshelves or a curved piece of wood etc.
So then you debunked conventional with your own sighted test.
More: A few years ago I tested this extensively in my home studio over a period of one month. At that time I was recording and mixing the music for my Tele-Vision music video. My studio is 33 feet from front to back, with the rear wall almost 30 feet behind me. So I placed two diffusors on stands adjacent side by side, and put them right behind my chair. At that time I was working and listening for about an hour or two every day. So for one week I had the diffusors literally a foot behind my head. Then I moved them back another foot or so and worked like that for a week. And so forth.

BThis was a useful experiment because I was able to live with each distance for a while. And that was when I concluded that the advice to be at least so many feet away was not valid. There are graphs that show how diffusors scatter over distance, and how the resonance from each QRD well needs distance to combine. That's absolutely correct and I agree with all of it. But having diffusors very close still sounds good, and again it sounds infinitely better than a bare reflecting wall.
You go against scientific wisdom supported by the the graphs because it "sounds good" in a non DBT . Also, not to turn this into a product pitch, but this is why my company sells both "far" and "near" diffusors. Both are six inches deep, but for the near type the diffusor portion is only three inches deep, backed by a three-inch deep bass trap which is also useful directly in front of a wall. Originally Posted by Gregadd
In support of your failure to to use DBT you offer this rational.

Nope. But that's not necessary because the effect was obvious. DBT is needed only to determine if something that is otherwise very subtle (and thus suspect) is actually audible. Nobody would miss the difference between diffusers and no diffusors.

That is the exact same argument used by reviewers who argue they don't need DBT. I can provide examples of them saying that if you like. Since you have spent so much time arguing against it, I'm sure you don't' need any.

These reviewers also claim DBT is not needed because inter alia the difference is obvious. I for one am really encouraged that you found listening a useful tool in refuting "conventional wisdom." Yeah I know, you never said you didinot. It's just great to have an actual example.Greg, please post these verbatim in the other thread as Ron asked, and I'll be glad to explain the errors in your logic.

--Ethan

Sure Ethan -Please explain.

I think you and others have already said that it does not matter if the differences are large. You furhter elaborated that DBT is needed if it goes against the science. Not that I found these explantions contradictory to what you did. I don't find any error in my logic and found your statements self explanatory.
Just as a joke you are so sufficient in carving up quotes, it is strange you needed my to get it over here.
 

garylkoh

WBF Technical Expert (Speakers & Audio Equipment)
Sep 6, 2010
5,599
225
1,190
Seattle, WA
www.genesisloudspeakers.com
Since we are discussing DBT - how many have done it at the threshold of perception? It is hard, exhausting work. In Foobar, the ABX comparator is a great tool. It allows you to double blind yourself. It is a humbling experience.

Sometime ago, I posted two files - one was a 24/192 digital from a DVD-A, and the other is a 24/192 digital recorded off vinyl - digital vs digitized analog. As far as I can tell from my website logs, one file has only been downloaded 5 times, and the other 6 times. Only one person has posted his impression of the two files.

Recently, because of another thread here discussing NOS DACs vs upsampling DACs and a PM discussion with another forum member, I tested myself with a 24/192 digital file, and the same file downsampled to 24/44.1 and then re-upsampled to 24/192. I figure that non-integer down-sampled and then non-integer up-sampled would be twice as bad as just upsampling......

You may say that my recording technique is bad, or I don't know what I am doing, but for what it's worth here are the four files:

http://www.genesisloudspeakers.com/downloads/Look_of_Love_yat.wav
http://www.genesisloudspeakers.com/downloads/Look_of_Love_yee.wav

http://www.genesisloudspeakers.com/downloads/Les_Brown_1_yat.wav
http://www.genesisloudspeakers.com/downloads/Les_Brown_1_yee.wav


In case you don't have it, here's the link to the Foobar ABX comparator:
http://www.foobar2000.org/components/view/foo_abx

In case you think that I've given you identical tracks, here's the link to Audio Diffmaker:
http://www.libinst.com/Audio DiffMaker.htm

The objective is not to get it right. The objective is to hear forum member's impressions of the experience of double-blind testing in the comfort and stress-free environment of your own home.
 

Ethan Winer

Banned
Jul 8, 2010
1,231
3
0
75
New Milford, CT
I already explain much of this yesterday here:

http://www.whatsbestforum.com/showthread.php?2316-How-The-Ear-Works&p=29509&viewfull=1#post29509

So then you debunked conventional with your own sighted test.

That wasn't a "test" nor was it meant to be. Testing is to determine if a difference can be heard, not to establish preference!

You go against scientific wisdom supported by the the graphs because it "sounds good" in a non DBT.

Again I have the same comment. This has nothing to do with testing audibility. It's entirely about preference for which there can be no right and wrong.

That is the exact same argument used by reviewers who argue they don't need DBT.

In my experience, when a forum thread becomes a heated discussion, and those reviewers are challenged, they suddenly back down and what had been "glaringly obvious improvement" turns into "Well, I admit the effect is subtle, but I'm sure I heard it." I pushed this to the extreme a while back at the Stereophile forum when one of their reviewers claimed an obvious improvement after "demagnetizing" LP records. John Atkinson and Michael Fremer - who I'm sure believe their own BS - were kind enough to mail me Wave files of an LP Before and After running it through a demagnetizer. But when I posted excerpts online and asked them and everyone else at the forum to identify which was which, not only could nobody say which was After, nobody was even willing to post any opinion at all. So much for an obvious difference!

Look Reg, I'm sure you believe that competent cables etc can sound different, but I'm highly confident that when pressed to hear those "obvious differences" blind you will fail. Since you and I live too far away to test this in person, this will likely never be resolved to your satisfaction. However, you are welcome to show any hard data to support your opinion. A good start would be two high-res recordings you make of Before and After for magnets or cable elevators or AC power wires or whatever tweaks you'd like.

I don't find any error in my logic and found your statements self explanatory.

Your main logic error is not understanding the difference between identifying a change in sound (DBT) versus preference for one sound over the other for which there is never one correct answer.

you are so sufficient in carving up quotes, it is strange you needed my to get it over here.

No, I asked you to copy your questions here because Ron rightly asked us to move it out of the bass traps thread.

--Ethan
 

amirm

Banned
Apr 2, 2010
15,813
37
0
Seattle, WA
Look Reg, I'm sure you believe that competent cables etc can sound different, but I'm highly confident that when pressed to hear those "obvious differences" blind you will fail.
Have you tried to test cables yourself Ethan? If so, how did you go about it?
 

microstrip

VIP/Donor
May 30, 2010
20,807
4,700
2,790
Portugal
Look Reg, I'm sure you believe that competent cables etc can sound different, but I'm highly confident that when pressed to hear those "obvious differences" blind you will fail.
--Ethan

I would expect that a scientific answer to this question would be : as the cables introduce a a frequency response alteration less than X dB and distortion of less than y% and we can not hear frequency changes of less than ZdB and distortion of less than W%, we have to carry a proper DBT to test it.

Ethan,
Can you put numbers in X, Y, Z and W?
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,553
1,786
1,850
Metro DC
I already explain much of this yesterday here:

http://www.whatsbestforum.com/showthread.php?2316-How-The-Ear-Works&p=29509&viewfull=1#post29509


That wasn't a "test" nor was it meant to be. Testing is to determine if a difference can be heard, not to establish preference!

Again I have the same comment. This has nothing to do with testing audibility. It's entirely about preference for which there can be no right and wrong.

That is entirely my point Ehan- You indulged a preferance over science. Too me that is extremely optimistic. A deviation that completely surprised me as an aberration from your usual strick sicientific stance. Maybe one day you will prefer tubes over solid state or audiophile cables over zip could. One can only hope.

It is axiomatic that in order to have a preference you must at least percieve a difference. Quoting Ethan WIner himself-if that difference goes against measurements then show me a DBT. Ethan also says if the difference is great no DBT is needed



In my experience, when a forum thread becomes a heated discussion, and those reviewers are challenged, they suddenly back down and what had been "glaringly obvious improvement" turns into "Well, I admit the effect is subtle, but I'm sure I heard it." I pushed this to the extreme a while back at the Stereophile forum when one of their reviewers claimed an obvious improvement after "demagnetizing" LP records. John Atkinson and Michael Fremer - who I'm sure believe their own BS - were kind enough to mail me Wave files of an LP Before and After running it through a demagnetizer. But when I posted excerpts online and asked them and everyone else at the forum to identify which was which, not only could nobody say which was After, nobody was even willing to post any opinion at all. So much for an obvious difference!

Once again for someone who does not do DBT himself it is no wonder so many of your challenges fall on deaf ears. Could it be they did not accept your challenge becausese they don't care what you think. Uh Blasphemy;)

Look Reg, I'm sure you believe that competent cables etc can sound different, but I'm highly confident that when pressed to hear those "obvious differences" blind you will fail. Since you and I live too far away to test this in person, this will likely never be resolved to your satisfaction. However, you are welcome to show any hard data to support your opinion. A good start would be two high-res recordings you make of Before and After for magnets or cable elevators or AC power wires or whatever tweaks you'd like.

Your guess about my opinion on cables or any other audio gear and my performance in a DBT is completely speculative. Just as an side your own performance in a DBTremains speculative since you have never participated in a DBT by your own admission.


Your main logic error is not understanding the difference between identifying a change in sound (DBT) versus preference for one sound over the other for which there is never one correct answer.

I do understand the difference between preference and identifying a change. Tim constantly reminds me. FOR THE LAST TIME IT is a surprise to me that Ethan's preference goes against measurements. It further surprises me that you are not intereted in testing yourself DB to assure that preference is not the result of bias.The fact that I am surprised or encouraged does not invalidate your preference.

No, I asked you to copy your questions here because Ron rightly asked us to move it out of the bass traps thread.
Once again your penchant for quoting out of context has altered the meaning of a statement. I said the comment was a joke. But you could easily have copied the comment and brought it over yourself. I have done it.
--Ethan
.
 
Last edited:

garylkoh

WBF Technical Expert (Speakers & Audio Equipment)
Sep 6, 2010
5,599
225
1,190
Seattle, WA
www.genesisloudspeakers.com
Look Reg, I'm sure you believe that competent cables etc can sound different, but I'm highly confident that when pressed to hear those "obvious differences" blind you will fail.
--Ethan


I would expect that a scientific answer to this question would be : as the cables introduce a a frequency response alteration less than X dB and distortion of less than y% and we can not hear frequency changes of less than ZdB and distortion of less than W%, we have to carry a proper DBT to test it.

Ethan,
Can you put numbers in X, Y, Z and W?

microstrip,

I don't have numbers to prove that cables would not be audible, but if I may, I'd like to contribute the following to this discussion:

First, the concept of temporal decay or time smearing - measured in dB/microsec - a suggestion that since the ear is a non-linear detection instrument, frequency response and distortion are insufficient by themselves to measure resolution and quality of recording and playback:

Temporal Decay: A Useful Tool for the Characterization of Resolution of Audio Systems
http://ip565bfb2a.direct-adsl.nl/articles/vmaanen-files/temporal-decay.pdf

Then, finding that the human temporal resolution threshold extends down to an upper threshold of 5 microsec (possibly even lower).
http://www.physics.sc.edu/kunchur/temporal.pdf

I know that you followed the Cable Theory thread here: http://www.whatsbestforum.com/showthread.php?1988-Cable-Theory
where Amir showed phase shifts of 1 deg at 7kHz - which at that point we thought was not audible.

A 1 deg phase shift at 7kHz corresponds to a time shift of 6 microsec. So, may be now it's audible?

DBT to confirm :D

Ehh... 0.6 microsec
 
Last edited:

microstrip

VIP/Donor
May 30, 2010
20,807
4,700
2,790
Portugal
First, the concept of temporal decay or time smearing - measured in dB/microsec - a suggestion that since the ear is a non-linear detection instrument, frequency response and distortion are insufficient by themselves to measure resolution and quality of recording and playback:

Temporal Decay: A Useful Tool for the Characterization of Resolution of Audio Systems
http://ip565bfb2a.direct-adsl.nl/articles/vmaanen-files/temporal-decay.pdf

Then, finding that the human temporal resolution threshold extends down to an upper threshold of 5 microsec (possibly even lower).
http://www.physics.sc.edu/kunchur/temporal.pdf

Thanks Gary - this is not coffee-talk anymore!
Only with this type of scientific research audio technology can grow from an empirical phase to become a scientific based technology.
 

Ethan Winer

Banned
Jul 8, 2010
1,231
3
0
75
New Milford, CT
Have you tried to test cables yourself Ethan? If so, how did you go about it?

In all honesty, I don't think I've ever tested a piece of wire in my entire 62 years. Of course, I've "tested" wires to verify they're not intermittent. But not an A/B comparison to tell if one sounds different than another. This would be easy enough to do. I wouldn't bother with a listening test though. I'd just stick a voltmeter across the circuit and sweep a sine wave from 20 Hz through 20 KHz. If the voltages at the far end of the wire were the same as at the sending end, I'd be satisfied that the cable is (correctly) not affecting the sound.

--Ethan
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu

Steve Williams
Site Founder | Site Owner | Administrator
Ron Resnick
Site Co-Owner | Administrator
Julian (The Fixer)
Website Build | Marketing Managersing