Double Blind Testing and the threshold of necessity

RBFC

WBF Founding Member
Apr 20, 2010
5,158
46
1,225
Albuquerque, NM
www.fightingconcepts.com
While the protocol for Double Blind Testing (DBT) is fairly well known, what is the threshold (for differences between devices under test) at which DBT is deemed necessary to validate the assertions of "audible vs. inaudible"?

How much consensus is necessary to eliminate the need for DBT, and what assurances do you feel are required to support this?

Lee
 

Phelonious Ponk

New Member
Jun 30, 2010
8,677
23
0
While the protocol for Double Blind Testing (DBT) is fairly well known, what is the threshold (for differences between devices under test) at which DBT is deemed necessary to validate the assertions of "audible vs. inaudible"?

How much consensus is necessary to eliminate the need for DBT, and what assurances do you feel are required to support this?

Lee

I think you'll find a wide variation of answers to that one, from DBT is not legitimate, to DBT is never necessary if you trust your ears, to DBT is always necessary (the Hydrogen Audio view). My personal view is that DBT is not necessary if you simply like what you percieve and don't care where the line lies between psychoacoustics and expectation bias. But if you're telling me you have a product that is audibly superior for reasons that should make no difference at all and/or in ways that cannot be measured, I think DBT is a requirement for credibility. Is it possible, as some here are currently arguing, that there are audible factors at work that simply can't be measured by today's technology? Sure, that's possible. But if they are, indeed, audible, proper DBT can settle the issue. Yet they remain typically untested, particularly by the manufacturers selling the products.

Consensus? A high amount of consensus can be reached among internet audiophiles on almost anything. Consensus is no measuring stick at all in my view.

Tim
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,581
1,798
1,850
Metro DC
Thanks Lee it was an OTC question. As a matter of fact my cursory research of a proper DBT protocol indicates most audiophiles either don't know or ignore a true DBT protocol analysis and how to properly analyze the statistical results.

Harmon Kardon is good place to start
 

Phelonious Ponk

New Member
Jun 30, 2010
8,677
23
0
Thanks Lee it was an OTC question. As a matter of fact my cursory research of a proper DBT protocol indicates most audiophiles either don't know or ignore a true DBT protocol analysis and how to properly analyze the statistical results.

Harmon Kardon is good place to start

Gregg, I agree on both counts.

Tim
 

mep

Member Sponsor & WBF Founding Member
Apr 20, 2010
9,481
17
0
Consensus? A high amount of consensus can be reached among internet audiophiles on almost anything.

Tim

Really Tim?? I find that audiophiles can't agree on anything as a group. The only consensus is that there is no consensus. Further, if I was at your house and you told me the sun was shining, I would stick my head out the window to verify it for myself.
 

microstrip

VIP/Donor
May 30, 2010
20,807
4,704
2,790
Portugal
As a matter of fact my cursory research of a proper DBT protocol indicates most audiophiles either don't know or ignore a true DBT protocol analysis and how to properly analyze the statistical results.

Greg,
Valid point. I have referred it several times - audiophiles do not have the knowledge or resources to reach valid conclusions using DBT protocols. The typical challenges we find in this forum are meaningless and do not have any statistical foundations. Even people who are used to engineering or scientific statistics do not master behavior statistics needed to study psico-physiological processes. But these sessions can be enjoyable if you like to entertain your friends (or maliciously embarrass some of them ...)

Some manufacturers use DBT methods with great expertise just as a statistical "orientation tool" for audio development, not to make science. Why spending large resources correcting or developing a feature that only one person in a thousand will accurately note, even if he is always 100% correct every time? When you are putting your money in something, you should use the proper tools. :)
 

amirm

Banned
Apr 2, 2010
15,813
38
0
Seattle, WA
It is easy to go after low-hanging fruit. DBT is not necessary if:

1. To any casual observer of the street, the difference is obvious. I usually test this threshold with my wife and children. If they have no trouble whatsoever in telling the difference and it is in the expected direction, I don't bother with DBT.

2. There are gross differences. If an amp rolls of its highs at 10 Khz, you don't need a blind test to know it sounds different than another. If the level is different, you are good to go also.

3. You have trained listeners who are usually right. You risk some level of error here but good enough for government work as they say :). BTW, this is the method that is used to test fidelity 99% of the time. Double-blind tests are expensive and take a long time to run to so you don't want to use them for every iteration if you are in this to make a living.
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,581
1,798
1,850
Metro DC
Interesting Amir. I thought DBT was necessary if you disagree with the other guys position. Not necessary for me because I know I'm right.:D
 

amirm

Banned
Apr 2, 2010
15,813
38
0
Seattle, WA
I think that is the difference between audio and other sciences where DBT is practiced. If you are testing a drug, then nothing can be presumed to be effective or safe. When testing audio, and there are measured gross differences, then you don't need to follow that path for example. Ditto for using expert listeners as a shortcut. There is no risk of someone dying other than stress of arguing a point on a forum :D.
 

mep

Member Sponsor & WBF Founding Member
Apr 20, 2010
9,481
17
0
Tomelex-Sounds like your son's TV has a problem or isn't set up correctly. CRT TVs are good for being used as a boat anchor.
 

Paul Spencer

Well-Known Member
Oct 4, 2010
48
0
296
I was involved in two listening tests where we compared digital active crossovers. We compared them working purely as digital converters, and then using the same filters. We followed my suggestion to do instant switching, but only managed to get it working on the second event. It was an informal series of tests with no intention to convince anyone else - more a matter of satisfying our own curiosity. Later we decided to share the result online, and I posted in a few different places. We were criticised for not doing it blind - I should have expected that. The group as a whole wasn't strongly biased towards any significant outcome - we suspected that differences would be very minor or not there at all. That is mostly what we found, except where measurements showed we had a different response. My contention was that the instant switching method was adequate to overcome any bias issues. We could switch in the middle of a sustained note and listen only for differences at the switch. No audio memory. We switched so frequently that it was not really possible to be biased because you quickly lose track unless you are deliberately trying to cheat. Yet some are never satisfied unless it's done double blind ABX style.

I would much rather a sighted instant switching test with 10 enthusiasts than a double blind ABX test with 100 people that relies on audio memory. I contend that even when tested blind, if one has to remember what they heard then compare it to what they hear right now, audio memory invalidates the test when the differences are subtle. Our listening system just isn't well designed for that kind of comparison. The same is true of our sight. Take two colour samples that are just slightly different and hold them apart. Can you tell which one is darker? Could 100 people tell? Even if they do no better than 50/50, any one person could pick the difference if you put them right next to each other with nothing in between. If this is true with sight, which we by nature tend to trust more than our ears, then how much more true is it of audio?

Where is the threshold? When do you need to test it blind? Well, perhaps the answer is also related to your goal. Do you want to satisfy your own curiosity only? Then I'd do an instant switch test and do it long enough to form an opinion. Do you want to try to prove a point? It may be that amplifiers that measure within certain conditions will sound the same when not clipping. Or do cables sound different? That suggests blind test. However, IMHO bias might not always be the major concern with a test. If you are doing your own small test and your bias is "there won't be a difference" then if anything that bias may cause you to miss very small differences. However, bigger concerns are spurious factors that can influence the result. Levels not matched. Frequency response differences that alone could account for differences perceived.
 

amirm

Banned
Apr 2, 2010
15,813
38
0
Seattle, WA
My contention was that the instant switching method was adequate to overcome any bias issues.
Sadly, I know first hand that the above is not true :). In more than one occasion, Ihave observed hearing substantial differences in two audio samples with instant A/B. Yet was told after the fact that the samples were identical! Amusingly enough, once told they were identical they well, sounded identical. Now this part will blow your mind. I tested to see if I could talk myself into hearing differences yet again and I could!:eek:
 

JackD201

WBF Founding Member
Apr 20, 2010
12,319
1,429
1,820
Manila, Philippines
DBTs are expensive yes but the reason they are very important for product development is that they can and do save the manufacturer a lot in costs when done properly and used for this purpose. Whether it is the choice of a capacitor, driver or a choice between sugar and corn syrup in a softdrink DBTs answer one fundamental question. Between the more expensive A and the cost effective B, for the target market, is B good enough? It will NOT tell you if A is better than B, or vice versa for every respondent base. In other words it will not answer what is best for everybody. Big companies with big runs obviously want products that appeal to as many individuals as possible and there is a correlation between what is deemed good enough by trained panels being good enough below. It doesn't go the other way. Still, the correlation is not a perfectly direct one, at least not yet.

As such, the respondents must be carefully selected to be representative of the target market. This alone indicates that the thresholds vary wildly.

So why do they vary so much in a DBT audibility scenario? We go back to the make up and need for trained respondents. There is a huge difference between being able to hear something and knowing what it is you are hearing. The sensory input may be present but the brain of an untrained respondent won't know what the heck to do with it. The results get skewed either as 50/50 guesses or "Don't hear it" responses. Another scenario is that the respondent does hear it and can identify it but he can't verbalize his observations in the same language as the other respondents. Again this is fine if the study was to figure out what percentage of a wide population can't make odds or ends on a parameter but I don't think that's what we're talking about here in a high performance (not high price) oriented forum.
 

garylkoh

WBF Technical Expert (Speakers & Audio Equipment)
Sep 6, 2010
5,599
225
1,190
Seattle, WA
www.genesisloudspeakers.com
Sadly, I know first hand that the above is not true :). In more than one occasion, Ihave observed hearing substantial differences in two audio samples with instant A/B. Yet was told after the fact that the samples were identical! Amusingly enough, once told they were identical they well, sounded identical. Now this part will blow your mind. I tested to see if I could talk myself into hearing differences yet again and I could!:eek:

I've experienced the same. I was doing an ABX of two digital files - one at 24/192 and the other that was down-sampled to 24/44.1 and then re-upsampled back to 24/192. I got over 90% correct the first two trials. Then, I tried to tell the difference between the two spectrums, and found that there was very little difference. This might be due to the up-sampling of 44.1 restoring some of the frequencies..... and now I can't tell the two files apart.

Where does that leave DBT? Or may be the question more correctly will be - how much is all this just nothing more than psychoacoustics.

When we indulge in other hobbies - fine watches and fine wine comes to mind - we don't DBT, and we are happy with what we drink, and we prove it with our wallets. Paying more doesn't give you better time. I was guilty of buying an expensive tourbillon, and all it was good for was showing off. I tried to explain to my wife that what it does was to overcome gravitational effects to improve accuracy, and she thought I was nuts. When she got worried that someone would chop my wrist off to steal the watch, I knew it was time to get rid of it.

When audiologists test for the lower threshold of hearing, they accept 80% accuracy, but that's not good enough for the naysayers of cable not making a difference. And yet, we can always convince ourselves to hear a difference even if there is no difference.
 

Phelonious Ponk

New Member
Jun 30, 2010
8,677
23
0
if the study was to figure out what percentage of a wide population can't make odds or ends on a parameter but I don't think that's what we're talking about here in a high performance (not high price) oriented forum.

I think that is often exactly what we're talking about in a high performance-oriented forum: Not the quality of what we might hear, but whether or not there is an audible difference at all.

Tim
 

JackD201

WBF Founding Member
Apr 20, 2010
12,319
1,429
1,820
Manila, Philippines
It's the proverbial half full/half empty conundrum Tim. I participate in high performance forums to learn things and to get me thinking outside of my box. I'm not out to evangelize my way of doing things and tear people down in the process. I'd like to think that anybody that participates in any high performance forum whether it be cars, bikes, boats, video, whatever think the same way. So no. It is quality not audibility we're talking about here. The name of the forum and its mission statement is crystal clear.
 

JackD201

WBF Founding Member
Apr 20, 2010
12,319
1,429
1,820
Manila, Philippines
The way I see it Tom is when anybody in this hobby/industry, consumer or manufacturer, checks for audibility we're testing for audible qualities anyway. That's what I mean by DBT being useful as far as finding out "what's good enough" for that individual or test group. That's as far as it goes though. One might even say that is the threshold in a simplistic yet common sense kind of way. It really is a moving target however because ultimately this hobby is individualistic in nature. Like it or not preferences rule. You just can't please everybody.
 

Ethan Winer

Banned
Jul 8, 2010
1,231
3
0
75
New Milford, CT
if you're telling me you have a product that is audibly superior for reasons that should make no difference at all and/or in ways that cannot be measured, I think DBT is a requirement for credibility.

Emphasis added by me, and this sums it up perfectly. As I said in the Bass Traps thread over in my forum section, a DBT is needed either when a difference makes no scientific sense, or when a difference can't be measured. So then a DBT can be used as proof of a real difference.

--Ethan
 

microstrip

VIP/Donor
May 30, 2010
20,807
4,704
2,790
Portugal
When audiologists test for the lower threshold of hearing, they accept 80% accuracy, but that's not good enough for the naysayers of cable not making a difference. And yet, we can always convince ourselves to hear a difference even if there is no difference.

We should use better statistic tools to establish this threshold. The main idea is that we should compare the result with a random result - if there is meaningful difference the test is called positive, and we accept there is a difference. The "meaningful difference" is related to the experiment conditions, method and even number of trials. In some experiments 51% can be meaningful ...
 

Gregadd

WBF Founding Member
Apr 20, 2010
10,581
1,798
1,850
Metro DC
Emphasis added by me, and this sums it up perfectly. As I said in the Bass Traps thread over in my forum section, a DBT is needed either when a difference makes no scientific sense, or when a difference can't be measured. So then a DBT can be used as proof of a real difference.

--Ethan
The problem I am having is that certainly if something is suspect because it can't be measured, it is most definitely suspect if it is in contravention of measurements and needs to be contradicted by science.
I can't find the quote right now but Ethan responded to several posts saying if you can't measure it you can't hear it. Moreover Ethan has repeatedly postulated even if you can measure it does not mean it is audible. The fact that Ethan let a sighted listening test trump measurements to me represents an epiphany. Others can make their own conclusion

I agree wholeheartedly that some things are obvious.
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu

Steve Williams
Site Founder | Site Owner | Administrator
Ron Resnick
Site Co-Owner | Administrator
Julian (The Fixer)
Website Build | Marketing Managersing