Page 4 of 28 FirstFirst 1234567891011121314 ... LastLast
Results 31 to 40 of 275

Thread: Do blind tests really prove small differences don't exist?

  1. #31
    Addicted to Best!
    Join Date
    Jul 2010
    Location
    bathurst NSW
    Posts
    517
    Hmm, how come ALL of the previous posts I quoted came up again, even tho I had answered them before and this is a completely new post??

    Quote Originally Posted by amirm View Post
    Even though we focus on two outcomes: hearing a difference and not, I am saying there is another which is the transition point. In this area, testers by definition are unsure so they vote correctly and other times not.
    I am pretty sure this is where I have had problems with your hypothesis before. I think it lies in THIS sentence...In this area, testers by definition are unsure so they vote correctly and other times not. I do get your idea of oscillation, analogous to the 1 or 0 point in digital (or the transition between them).

    See, I take it that they vote correctly each time. THAT there was a difference each time, and they did not hear it, is neither here nor there.....????? If they heard it they voted so, if they did not hear it they voted so. One time they heard it (say), voted 'yes', the next time they did not hear it and voted 'no'. You say 'Well, they SHOULD have heard it, so we are getting a null result unfairly' (or words to that effect).

    I am saying 'We are getting a null resultfairly because sometimes they heard it and sometimes they didn't'. Same set of tests, exactly the same result and conclusion (null test), yet with completely different 'spin' (if you will) on it.

    It seems you are setting it up that there is a right or wrong answer, is they SHOULD have heard it or such? That, I thought, would be the essential question here, CAN they hear it or not, yes or no.

    They also tend to second guess themselves because they are being asked to give a Yes/No answer. I believe both of these factors push the statistics to be non-conclusive. And since non-conclusive is taken as "statistically can't tell the difference" I theorize that we tend to opt for negative findings in this situation.
    HOW do you know they tend to second guess themselves?? Maybe they do or do not, it just reads as a statement of fact (which it could be for all I know).

    IF they reported a sense of second guessing, well to me that is as validly explained by saying the two stimuli were so close together great difficulty was had. No need to decide anything more than that.

    I further am interested in how we quantify the region. One of the ways to do that in my opinion is to scrutinize the test itself. In compression testing, we have a set of audio tests we know to be revealing.
    How do we know they are revealing if not by blind tests? IS that how 'we know'??

    The content can be shown mathematically to be revealing of differences.
    I don't quite understand 'mathematically' here, unless you mean things like 'only 10 db down' or stuff like that. Ie we are using JND's?? (which I further assume is done by large scale blind testing?? maybe not in the same sense as we use it for audiophile stuff, just 'unknown stimuli given to listeners to see what *we* can hear or not hear')



    I just get the idea that you are arguing from 'they SHOULD have heard it yet didn't' and are using that to hang everything off. Was not there some thing about a swedish radio codec that is used to show why DBTs cannot find the small differences? Is that the type of example in mind here?

  2. #32
    Senior Member
    Join Date
    Sep 2010
    Posts
    144
    Quote Originally Posted by amirm View Post
    Even though we focus on two outcomes: hearing a difference and not, I am saying there is another which is the transition point.
    Actually, it's way more complex than that. For a given trial of a multi-trial test, it is not just a situation of audible/inaudible, or a three-state situation as you postulate. To be as specific as possible, let's talk ABX, where at each trial, one must choose whether X is A or B. For an inaudible difference, the probability of correctly identifying X is 0.5 (same as correctly guessing the state of a flipped coin). For an audible difference, the probability of correctly identifying X is greater than 0.5, and less than or equal to 1. As this probability goes from 0.5 to 1, one could say the effect is more and more audible, and this probability is a continuum within that range.

    Let's assume we are clairvoyant, such that we actually know what this probability of correctly identifying X is, and we do a test of N trials, for which the test subject gets M out of N correct. According to some criterion, we reach one of two conclusions:

    1) The difference is audible
    2) The difference is inaudible

    What we'd like to know is:

    A) The probability that the test reaches the conclusion that the effect is audible when it's really inaudible.
    B) The probability that the test reaches the conclusion that the effect is inaudible when it really is audible.

    Now, for the fairest possible test, what should the relationship between the probabilities expressed in A and B above be? That's what Leventhal asks, and derives results that approximate the desired situation.

    It seems to me that these questions are the crux of this thread - or at least should be, as the previous statements of its purpose seem somewhat confused to me.
    Last edited by andy_c; 05-14-2011 at 11:29 PM.

  3. #33
    Site Founder And Administrator amirm's Avatar
    Join Date
    Apr 2010
    Location
    Seattle, WA
    Posts
    7,404
    Quote Originally Posted by terryj View Post
    HOW do you know they tend to second guess themselves??
    'Cause I have done it! And many times!

    Let me expand. When I do a test and I hear a small difference I run a test. I pretend there is no difference and listen again. By doing so, I am able to erase the difference. And then I put aside that intent and hear the difference again. I know for sure then at least for me, I am able to convince myself either way when the difference is subtle. I have had the experience in formal tests and informal ones I have run myself.

    Maybe they do or do not, it just reads as a statement of fact (which it could be for all I know).
    Again, the experience is real for me. Indeed, this is the motivation for me to probe here. I want to understand how the above factor affects testing results. Note that this is not "guessing." I am not flipping a coin because I just can't tell. I hear a difference and then wonder if I shouldn't have. I then listen again and lo and behold, there is no difference.

    How do we know they are revealing if not by blind tests? IS that how 'we know'??
    I don't know. It is a quandary which I was hoping we could figure out at least partially by discussing it. I used the analogy before of who scientists find out what happened thousands of years ago in the galaxy. They find secondary evidence to get data, solving the problem of us not having been there when the event occurred. No idea of this is a good analogy or not but is an example of solving impossible problems .

    I don't quite understand 'mathematically' here, unless you mean things like 'only 10 db down' or stuff like that.
    I mean essentially that. Let's say I can show that distortion exists at 50 db down but only for 100 milliseconds of a transient. We can here distortion that high but not necessarily so when it lasts so little. This is a small, but measurable level.

    I just get the idea that you are arguing from 'they SHOULD have heard it yet didn't' and are using that to hang everything off. Was not there some thing about a swedish radio codec that is used to show why DBTs cannot find the small differences? Is that the type of example in mind here?
    I have argued against that test case really existing so don't want to go there . My angle is not that there is some rare listener who would have heard the issue as was the case above. I am saying that whoever we have picked may be operating as I have been many times, in the area of uncertainty.
    Amir
    Founder, Madrona Digital Audio, Video, Home Automation
    Contributing Editor, Widescreen Review Magazine

  4. #34
    Addicted to Best!
    Join Date
    Jul 2010
    Location
    bathurst NSW
    Posts
    517
    (while I remember, looking at your time of the post, you may have missed a better response from andy c above)

    Gotcha on the second guessing, not a finding as such but personal experience. It could be possible you are simply 'at that end of the spectrum'?? Ie, entirely possible that few would be as 'analytical' or self examining as you??

    Anyway, re the second guessing (did I really hear it? pretend it is not there and see what happens) I have to be honest and say you are starting to be even less convincing for me now.

    A personal example of what I think is a similar psychological phenomenon, I might sit you down and play you the latest song which has absolutely blown me away. I mean MAN, I LOVE it, have not stopped listening to it since I came across it ok?

    When you are in the hotseat, I look out of the corner of my eye and for whatever reason, I begin to have the slightest thought that you may not like it as much as me. I find that quickly grows, to the extent that I am uncomfortable 'forcing' this sound on you and it does not sound good to me even!!!

    Others have said something like 'that happens with movies too' etc etc, I am sure we all have our own examples.

    I see that as being entirely similar to your example, in other words it is *trivially* easy to completely change the way we perceive things. Ok, maybe in mine it took an external stimulus (playing the track for someone), but I don't see that the perception changed as proof of much at all.

  5. #35
    Site Founder And Administrator amirm's Avatar
    Join Date
    Apr 2010
    Location
    Seattle, WA
    Posts
    7,404
    We can only share personal experiences so yes, that is mine.

    Here is another situation. In a third-party run blind test of video codecs, our video codec actually garnered a higher score than the hidden reference in one of the tests. That should be impossible right? The test was a butterfly test where the video was split with the one on the left always the original and the one on the right, the sample under test or the original itself. People were to score from 1 to N, how close it was to the original. Somehow, we managed to get higher scores than the hidden reference.

    If you dig into why, we realize that it was possible that our sample had filtered out some of the noise so it garnered higher preference since it actually made the outcome better than what we started with perceptually. That revelation came from analyzing said results. That is what I am trying to do here. Looking underneath the test results and seeing if there could some aberrations. Clearly the blind test results above were inaccurate in claiming a degraded sample was more real than the original sample right next to it. Humans scored and scored wrong.

    It is not often that we have outcomes like above where we know the results have to be wrong. When we don't get lucky that way, how can we still determine the levels of accuracy in the test?
    Amir
    Founder, Madrona Digital Audio, Video, Home Automation
    Contributing Editor, Widescreen Review Magazine

  6. #36
    WBF Technical Expert (Computer Audio)
    Join Date
    Jul 2010
    Posts
    678
    Humans scored and scored wrong.
    They didn’t. They never do. Don't blame them for the flaws in your experimental design.
    You asked for preference.

    Likewise there is a test floating around at Hydrogenaudio.
    Two identical pieces of music, which one do you prefer sound quality wise?
    Most prefer “B”, indeed a low bit rate MP3 of the original.
    As the original recording is very sharp the high rolloff of the MP3 is preferred.

  7. #37
    Site Founder And Administrator amirm's Avatar
    Join Date
    Apr 2010
    Location
    Seattle, WA
    Posts
    7,404
    Quote Originally Posted by Vincent Kars View Post
    They didn’t. They never do. Don't blame them for the flaws in your experimental design.
    You asked for preference.
    First, this was a test conducted by DVD Forum for selection of video codecs for high-definition discs. That test led to adoption of VC-1 in both HD DVD and Blu-ray. So it was not our test.

    Second, the scoring was to be relative to the butterfly image to the left. They were not asked to vote for preference. They were asked how close that one on the right came to the one to the left. When we were on the right, we were voted to be closer to the original than the reference itself.

    Note that the hidden reference did not score perfectly. It never does in these tests. It always rates lower than perfect because some people think they are being tricked and vote it down as not being as good as the original. So our high watermark is always this value, not a perfect high score.

    Likewise there is a test floating around at Hydrogenaudio.
    Two identical pieces of music, which one do you prefer sound quality wise?
    Most prefer “B”, indeed a low bit rate MP3 of the original.
    As the original recording is very sharp the high rolloff of the MP3 is preferred.
    This was a three-way test, not two-way. The original was always on screen on the left. The right was either the compressed sample or the original. They did not play two degraded samples and ask which one was better.
    Amir
    Founder, Madrona Digital Audio, Video, Home Automation
    Contributing Editor, Widescreen Review Magazine

  8. #38
    Quote Originally Posted by andy_c View Post
    This is called special pleading.
    ....as I noted way back in one of my first posts on this thread ;>

    Amirm, might I suggest you simply contact your former co-worker JJ and ask him where the evidence for JND and the efficacy of DBTs comes from? I'm kind of shocked that much of this seems new to you.


    Also, the status quo for determining audio quality hasn't been demonstrated to be a 'better way' than one based more primarily on DBTs. In fact you might want to ask Sean Olive about that...I hear he has his own forum now ;>

  9. #39
    Quote Originally Posted by andy_c View Post
    It is evidence of measurable differences, but I assume we're talking audible differences. One could say the measurements are evidence of potential audible differences though.

    This thread is a bit confusing to me because of its title, "Do blind tests really prove small differences don't exist?". Of course in general it's impossible to prove a negative as it were, so the answer "no" is almost a given. Actually, the title is a nonsequitur if one examines it critically, because the idea that one could "prove" that something which really does exist does not is nonsense. But it seems to me the intent of the discussion is to determine whether blind tests mask small audible differences. In other words, that a blind test might reach the conclusion that something is not audible when it actually is.

    This issue has been thoroughly covered by Leventhal. The only free resource to his work that I know of is in this group of Stereophile Letters to the Editor. He did some peer-reviewed stuff for the AES that covers the issue in excruciating detail, as well as being very fair, but one must unfortunately purchase the articles from the AES.
    Leventhal's work ikn this area is mainly a reminder that Type 2 errors exist (not just Type 1), and should be factored into the statistics, and that statistical power of a test is important too.

    Neither of these suggest that DBT is intrinsically incapable of 'detecting' small difference. It's really just a call to do them right -- and not to let conclusions exceed the message of the data.

  10. #40
    Quote Originally Posted by sasully View Post
    Leventhal's work ikn this area is mainly a reminder that Type 2 errors exist (not just Type 1), and should be factored into the statistics, and that statistical power of a test is important too.

    Neither of these suggest that DBT is intrinsically incapable of 'detecting' small difference. It's really just a call to do them right -- and not to let conclusions exceed the message of the data.
    This was the point I was making in the comments of mine that were referenced at the start of this thread: that when the differences are small, and particularly when such differences will not be audible with all program material, you need to run a large number of trials in order to be able to bring the power of statistical analysis to bear.

    John Atkinson
    Editor, Stereophile

Page 4 of 28 FirstFirst 1234567891011121314 ... LastLast

Similar Threads

  1. Does Meitner Audio still exist?
    By docvale in forum General Audio Discussions
    Replies: 19
    Last Post: 09-19-2011, 02:04 PM
  2. Truth and Tonality: can they co-exist?
    By fas42 in forum The Dialectical Audiophile - with Gary L Koh
    Replies: 539
    Last Post: 02-21-2011, 04:24 PM
  3. Replies: 1
    Last Post: 12-01-2010, 06:06 PM
  4. Replies: 80
    Last Post: 07-17-2010, 04:12 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •