Page 1 of 28 1234567891011 ... LastLast
Results 1 to 10 of 275

Thread: Do blind tests really prove small differences don't exist?

  1. #1
    Site Founder And Administrator amirm's Avatar
    Join Date
    Apr 2010
    Location
    Seattle, WA
    Posts
    7,258

    Do blind tests really prove small differences don't exist?

    Admin note: this thread started at the end of jitter thread and good suggestion was made to make it its own thread. So here it is.

    Quote Originally Posted by sasully View Post
    What he's responding to is John Atkinson claiming that its extraordinarily hard for blind tests 'to produce anything but a null result even when real audible difference exist' and it is a test that 'does not work'. Which is simply wrong.
    I am curious where I would read about that Steven.

    The power of blind testing comes from elimination of bias. It does that powerfully and can be abundantly easy to see and prove using real data. Its reverse role for finding small differences is much more difficult if not impossible to prove. To wit, I can make a change to the system that is measurable and strongly so, yet not found in a blind test. The fact that we cannot use objective data to determine if our objective tests is working puts us in a tough, tough situation.

    Complicating matters, I can show that one person can hear such differences and another cannot using the exact same methodology. Is it that the difference is not audible to the latter person or that the test that made it harder for him? How do I disambiguate that as a matter of science?

    In another thread, I hypothesized based on my personal experience that blind tests may provide too conservative view of audible differences. Theory I put forth was that if the mind can manufacture differences or imagine them being larger than they are, there is no reason to think that it can't do the reverse, second guessing itself in a blind test and erase a difference that may be there. And it doesn't have to do that often as to cause the results to become "statistically insignificant."

    I am very interested in figuring out how to prove that real differences that are heard by the ear and the brain are indeed always detected in blind tests. Are there papers or studies I can read about this in the field of audio?
    Amir
    Founder, Madrona Digital Audio, Video, Home Automation
    Contributing Editor, Widescreen Review Magazine

  2. #2
    Quote Originally Posted by amirm View Post
    I am curious where I would read about that Steven.
    You can listen to Atkinson say it, in that mp3. If its his *wrongness* you're referring to, IIRC JJ's powerpoint presentations online are a good place to start. He seems fairly convinced that DBTs using trained listeners are an excellent means to detect small differences. (Does this topic lie in the 10% of difference you have with him?)

    The power of blind testing comes from elimination of bias. It does that powerfully and can be abundantly easy to see and prove using real data. Its reverse role for finding small differences is much more difficult if not impossible to prove. To wit, I can make a change to the system that is measurable and strongly so, yet not found in a blind test. The fact that we cannot use objective data to determine if our objective tests is working puts us in a tough, tough situation.
    Well, define 'strongly so' . Depending on the range being measured, and the parameter being measured, a numerically large difference could still be inaudible, could it not?
    Complicating matters, I can show that one person can hear such differences and another cannot using the exact same methodology. Is it that the difference is not audible to the latter person or that the test that made it harder for him? How do I disambiguate that as a matter of science?
    If you can show in repeatable trials that one person can hear the difference, I'd say the difference is audible. (Scientists prefer an n of at least three.) If you can *only* find one person out of a large sample of poeple that can do that, I'd say it's audible but unlikely to be so to most people. The reason why it's inaudible to others could be multifold..another topic for investigation. It might be as simple as, one person has more acute hearing than the other. It's probably something rather mundane.




    In another thread, I hypothesized based on my personal experience that blind tests may provide too conservative view of audible differences. Theory I put forth was that if the mind can manufacture differences or imagine them being larger than they are, there is no reason to think that it can't do the reverse, second guessing itself in a blind test and erase a difference that may be there. And it doesn't have to do that often as to cause the results to become "statistically insignificant."
    Theories are fun,aren't they? One wonders whether they are really necessary in this case, or whether it's just special pleading -- audiophiles seem to hate the idea that their audio perception apparatus isn't a flawless detector of quality.

    I am very interested in figuring out how to prove that real differences that are heard by the ear and the brain are indeed always detected in blind tests. Are there papers or studies I can read about this in the field of audio?

    Well, how would you determine that differences are 'heard by the ear and the brain' without a listening test? fMRI? And is physiological activity detected in the ear and brain the same as 'heard'? Even in the visual system, our eye 'sees' lots of things that the brain edits out.

    And by 'always' detected in blind tests, are you suggesting that all blind tests are equally well-constrained? Or are you referring to a particular protocol, e.g., one proposed as AES standard?
    Last edited by sasully; 05-13-2011 at 11:21 AM.

  3. #3
    Site Founder And Administrator amirm's Avatar
    Join Date
    Apr 2010
    Location
    Seattle, WA
    Posts
    7,258
    Quote Originally Posted by sasully View Post
    You can listen to Atkinson say it, in that mp3. If its his *wrongness* you're referring to, IIRC JJ's powerpoint presentations online are a good place to start.
    I meant the latter. I have watched JJ's presentations but I am not sure to completion or inclusion of this point. Do you remember his explanation and logic? And can you talk it through in your own mind?

    He seems fairly convinced that DBTs using trained listeners are an excellent means to detect small differences. (Does this topic lie in the 10% of difference you have with him?)
    No, I have not had this discussion with him. Wouldn't be surprised at all that he would have this position but I don't know the reasoning he might have.

    Well, define 'strongly so' .
    As in completely changing the psychocaoustics model of a codec and hence all the bits changing yet the outcome not being conclusive at all that anything changed. Take this simpler example. We tested 2:1 compression of music from 1.4 mbit/sec to 750 kbps. In doing so, all the samples are changed as the other example yet detectability was exceptionally low. I could tell the difference and I think a few others but tens of others could not.

    Yes, the codec does an excellent job of hiding its artifacts but quandary remains: we know we changed the samples drastically. We know some people heard it. But many, many people could not. How does one disambiguate whether the few of us had better listening ability vs blind testing disadvantaging people who are not used to running it?

    Depending on the range being measured, and the parameter being measured, a numerically large difference could still be inaudible, could it not?
    Of course. I am looking for a protocol to prove that and find that it gives me a headache . I am often fascinated how astrophysicists (?) think of clever schemes to figure out the existence of celestial bodies light years away based on gravity, X-ray and such and wonder out lout if there is way to create an experiment that quantifies this issue.

    When we use an instrument, we know its accuracy level. It seems odd that we use blind listening tests yet we don't know how much they could misfire. Are they wrong 10%, 20%, 30%, 50%? I have had third-parties run blind tests that we knew to be 100% wrong and indeed, they arrived perfectly wrong conclusions! In those cases, we knew the data set they used was not revealing of what was being measured. But in many of these other cases we are in the dark.

    If you can show in repeatable trials that one person can hear the difference, I'd say the difference is audible. (Scientists prefer an n of at least three.) If you can *only* find one person out of a large sample of poeple that can do that, I'd say it's audible but unlikely to be so to most people.
    I don't know about the last part of this sentence. I would say that is an assumption that I like us to challenge and understand. I want to know why the other people did not hear it. We can't say we believe in objective testing yet not have an objective and positive proof point for that.

    Well, how would you determine that differences are 'heard by the ear and the brain' without a listening test? fMRI? And is physiological activity detected in the ear and brain the same as 'heard'? Even in the visual system, our eye 'sees' lots of things that the brain edits out.
    I have asked this question before. We have a number of people in the medical field here so perhaps some suggestions can be made. My sense however is that medical measurements are nowhere accurate enough to detect these differences but I am just guessing.

    And by 'always' detected in blind tests, are you suggesting that all blind tests are equally well-constrained? Or are you referring to a particular protocol, e.g., one proposed as AES standard?
    I am referring to all of them. Here is a scenario:

    You are comparing two samples. You hear a difference. Then you think, maybe I am being tricked. After all, this is a hidden reference in there and I sure don't want to look like a fool not hearing it . Or if you don't know that, you wonder based on previous samples that sounded the same, this one is probably the same and you are imagining it. So you try to convince yourself that there is no difference and lo and behold, there isn't any and you vote that way.

    Now what if it turns out that there was a difference after all that others detected? Did we just get a more pessimistic of this current tester because we subjected him to a test protocol?
    Amir
    Founder, Madrona Digital Audio, Video, Home Automation
    Contributing Editor, Widescreen Review Magazine

  4. #4
    WBF Founding Member and Super Moderator JackD201's Avatar
    Join Date
    Apr 2010
    Location
    Manila, Philippines
    Posts
    5,792
    From my perspective the answer is no, blind tests do not prove small differences do not exist. Why? Because the output of blind tests are statistical. Being such, there will always be the probability of different results for the same test when performed on different sets of subjects. One group may have outliers while others might not. Given the identical test in other words, 1+1 may or may not be equal to 2.

    It is still an invaluable tool however when biases need to be erased. Like Amir says, bias goes both ways. There are positive and negative biases. Sean tells us it takes time to get reliable results from a testing panel on small differences. It takes training and experience. The subjects have to be so used to taking blind tests that there is no anxiety about having to prove anything about themselves one way or the other.

  5. #5
    Addicted to Best!
    Join Date
    Jul 2010
    Location
    bathurst NSW
    Posts
    517
    Quote Originally Posted by JackD201 View Post
    From my perspective the answer is no, blind tests do not prove small differences do not exist. Why? Because the output of blind tests are statistical. Being such, there will always be the probability of different results for the same test when performed on different sets of subjects. One group may have outliers while others might not. Given the identical test in other words, 1+1 may or may not be equal to 2.
    Ok, what about the more *usual* type of test (hmm, IS it more usual?).

    Specific guy A says 'I hear yada yada'.which according to the other side is 'debatable' at least.

    So Specific guy A does the test, in his own home, on his own gear and now fails to hear what he said he heard.

    Ok maybe that is not so usual at all, but as a thought experiment how far does it go to answer your considerations??

    Amir, I appreciate what I think is the thinking behind this, will re-read it again but just whilst answering now-but I can't help thinking that it is starting from the wrong point/for the wrong reason?

    At least from my perspective.

    It's kinda a chicken and egg thing maybe, 'DBTs have trouble revealing the small differences' only has validity IF you believe those small differences are real.

    'DBTs have trouble revealing small differences' has no validity if you feel those small differences do not exist.

    Haha, maybe the old truism is true, the truth lies in the middle somewhere?...(yes, I get that is what you are asking)

    You are asking for papers etc on this?? Would there be any?

    The counter to your example where we know the signal was markedly different yet not heard, are the ones where we know there was NO change in the signal, yet a difference was heard. (bias, already acknowledged).

    Personally I suppose these sort of musings give me direction on how to view it all. IF actual real differences are NOT heard in a dbt, then (for me) it is ok to completely ignore them. That is not an answer to the question, but I feel it is a useful way to use the results of dbt's.

    The other way to look at Jack's point (1+1 does not always equal two) is 'who cares' if someone, somewhere eventually can hear the difference between two cables. Yet every audiophile cable believer out there WILL think 'a-hah, I can hear it too'. I do not hold my hearing in so high a regard to suppose I have that ability.

    I know people can run the 100m in <10 secs, the perception of my self worth does not require me to think I can do it too.


    That IS all off topic to a degree, yet in some way is an answer to the question of the worth of dbt's.

  6. #6
    Addicted to Best! tomelex's Avatar
    Join Date
    Jul 2010
    Location
    USA
    Posts
    1,842
    Good points above gentlemen.

    I think blind tests reveal rather how large a difference must exist before a particular individual registers it. One can easily go lower and then it becomes as has been stated above some sort of internal guess game sometimes.

    When I make say changes to a circuit, I have to mix my self up, ie switch the change in and out so many times that I am not aware of which way is which, so that I am at ease to listen for changes and which I might prefer. One must absolutely remove the bias factor IMO. Hearing changes is of course not the same as hearing "closer to the real thing", which to me is not a valid goal with two channel stereo anyway.

    Tom
    Tom
    ____
    It's impossible for stereo two channel mic/speakers to realistically replicate unamplified musical events. The resulting unrealistic reproduction must be accepted or leaves some desiring more. Some endlessly change components pursuing the impossible. With 10 being realistic replication, I generously give stereo a rating of 5 for "getting me there". I rate binaural via headphones 8. I pursue detail/tone over soundstage. Objectivists and Subjectivists debate an ILLUSION!

  7. #7
    Site Founder And Administrator amirm's Avatar
    Join Date
    Apr 2010
    Location
    Seattle, WA
    Posts
    7,258
    Here is some more food for thought and what I have experienced in real life, taking these tests.

    Let's assume there is a spectrum from 1 to 100. At 1, everyone hears a solid difference with 100% confidence. And that 100% the opposite is true, no one hears anything no matter what.

    Let's further assume that the transition point is at 70 when thing start to tilt from one direction to the other.

    What happens to your evaluation of what you are hearing as you climb toward 70? We know from electronic circuits called comparators that they become unstable in that area and their function ceases to work. My brain does the same thing. It thinks it hears something but can't swear by it. So it vacillates especially when I am being tested and have to take sides.

    My theory is that blind tests are unreliable when used in that transition point. Our ear and brain combination are simply not designed to give binary answers in that area that the test requires.

    Working backwards, we know that differences do exist in transition area because that is the definition of transition area. It is a gray region between no difference and a difference.

    For mass market products, we don't care about transition area. We can opt to assume that any uncertainty is not worth the cost in equipment cost to chase out. This is important as these discussions do not erupt with average consumers but audiophiles. For them, the extra cost is justified if it rules out audibility issues even some of the time (think of a few millisecond transient of a piano or guitar pick).

    To wit, I rip into lossless format because no matter how high I set the bit rate of a lossy codec, once in a while I hear an annoying artifacts for a fraction of a second and even if the rest of the song is perfect, that bothers me.

    The above is why I like to know how we can quantify where in that spectrum the test is. I look at blind tests and I see someone just getting some random "audiophile" content and assume it is revealing of the issue at hand (jitter, high resolution formats, etc.). We never do that in other proper areas of audio research. For compression tests, we use tracks that are hugely revealing of artifacts. This pushes us past the 70% point above.

    Using trained listeners also helps to push us out of this hazy area as mentioned because they are able to hear things more readily and less inclined to second guess themselves.

    All of this is compounded by induction logic people want to use to take the results of one blind test, and apply it to all people, all content and all equipment. We know that no test is applicable that way when it produces negative results. It is tempting however to say, "well, 500 people couldn't hear the difference statistically therefore no one can." I say before you can say this, you at least need to demonstrate if you were or were not in the 70% area. Because if you are in 70%, your testers were "oscillating" in how they were voting which is all that is needed to invalidate their results beyond chance.
    Amir
    Founder, Madrona Digital Audio, Video, Home Automation
    Contributing Editor, Widescreen Review Magazine

  8. #8
    Addicted to Best!
    Join Date
    Jul 2010
    Location
    bathurst NSW
    Posts
    517
    Quote Originally Posted by amirm View Post
    Here is some more food for thought and what I have experienced in real life, taking these tests.

    Let's assume there is a spectrum from 1 to 100. At 1, everyone hears a solid difference with 100% confidence. And that 100% the opposite is true, no one hears anything no matter what.

    Let's further assume that the transition point is at 70 when thing start to tilt from one direction to the other.

    What happens to your evaluation of what you are hearing as you climb toward 70? We know from electronic circuits called comparators that they become unstable in that area and their function ceases to work. My brain does the same thing. It thinks it hears something but can't swear by it. So it vacillates especially when I am being tested and have to take sides.

    My theory is that blind tests are unreliable when used in that transition point. Our ear and brain combination are simply not designed to give binary answers in that area that the test requires.

    Got the concept, but WHY does that happen? The starting point you are making is that because it is a DBT, when we reach the 70% mark we oscillate.

    IF the stimulus is at the 70% mark, why does it not apply in a non dbt situation? Why in a sighted audiophile condition does oscillation not occur?

    In other words, why is the sighted conclusion more reliable?

    The above is why I like to know how we can quantify where in that spectrum the test is.
    For sure, you gotta test the correct thing!

    All of this is compounded by induction logic people want to use to take the results of one blind test, and apply it to all people, all content and all equipment. We know that no test is applicable that way when it produces negative results. It is tempting however to say, "well, 500 people couldn't hear the difference statistically therefore no one can." I say before you can say this, you at least need to demonstrate if you were or were not in the 70% area. Because if you are in 70%, your testers were "oscillating" in how they were voting which is all that is needed to invalidate their results beyond chance.
    I agree that these conclusions are pushed past their comfort zone, but am wary about the 70% area you are emphasising again. Again, if we cannot overcome this 'problem' with a dbt, I am curious what argument there might be that normal, non-rigorous audiophile listening gives more reliable results.

    The point may be valid, but be wary about throwing out the baby with the bathwater by going to usual auditioning procedures???

  9. #9
    WBF Founding Member Ron Party's Avatar
    Join Date
    Apr 2010
    Location
    Oakland, CA
    Posts
    2,189
    Amir are you not casting doubt on the whole concept of JND?
    Peace.

    Ron Party

  10. #10
    WBF Founding Member and Super Moderator JackD201's Avatar
    Join Date
    Apr 2010
    Location
    Manila, Philippines
    Posts
    5,792
    Quote Originally Posted by terryj View Post
    Ok, what about the more *usual* type of test (hmm, IS it more usual?).

    Specific guy A says 'I hear yada yada'.which according to the other side is 'debatable' at least.

    So Specific guy A does the test, in his own home, on his own gear and now fails to hear what he said he heard.

    Ok maybe that is not so usual at all, but as a thought experiment how far does it go to answer your considerations??
    Then I learn more about my personal limits don't I? I can't see how that can be a bad thing. Still, saying I tested myself and failed doesn't mean everybody will fail too. I'd fail Amir's artifact test for the simple reason I can't put a name to the artifacts. If I don't know what I'm listening for and don't know what to name it, what answer could I give that could be correlated by Amir? If I did hear artifacts then I'd have to resort to using whatever language I have available using words like edge, blur or whatever nebulous descriptor the objective types absolutely can't stand because of the lack of specificity. Is that the subject's fault? I say hell no. That's the tester's fault.

    The great irony of DBTs is that those that herald it, sometimes to the point of being the end all and be all, always say things like "I do not hold hearing in so high regard". Well, DBTs are listening tests. Hearing can't be that bad then. DBTs have and will always be used for verification of audibility. That's it. The output will always state the confidence intervals. At a confidence interval of 95, one could have counter occurrences upwards of 35%. Statistics is a science but one must remember the math is quite a bit different in the way it is appreciated by the layman. There are many cases of wrongful convictions based on expert testimony misunderstood by the juries. 1 + 1 is not equal to two a lot of the time just as a confidence level of 95 does not guarantee 95% outcome.

    Now I have absolutely no problem accepting the results of DBTs for as long as the conclusion presents the statistical parameters. I will not however accept the outcome as absolute for reasons I hope I have made clearer.

Page 1 of 28 1234567891011 ... LastLast

Similar Threads

  1. Does Meitner Audio still exist?
    By docvale in forum General Audio Discussions
    Replies: 19
    Last Post: 09-19-2011, 02:04 PM
  2. Truth and Tonality: can they co-exist?
    By fas42 in forum The Dialectical Audiophile - with Gary L Koh
    Replies: 539
    Last Post: 02-21-2011, 04:24 PM
  3. Replies: 1
    Last Post: 12-01-2010, 06:06 PM
  4. Replies: 80
    Last Post: 07-17-2010, 04:12 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •