Objectivist or Subjectivist? Give Me a Break

jkeny · Aug 17, 2013

Groucho said:
When it comes to sources and amps, I think it should be possible to run a comprehensive battery of 'distortion' tests (THD, IMD, transient signals, linear distortions e.g. frequency/phase response) and, based on psycho-acoustic models, to come up with a rough measure of the worst case degree of audibility of every deviation from perfection.

Ah, a nice, nebulous aspiration. Have you any examples of this being done in audio or detail? Saying "it should be possible" really doesn't give the aspiration any substance.

As in engineering, you can't test a complete complex system exhaustively - far too many permutations, variables and dimensions - but you can test each component or sub-system exhaustively. So you can only crash test a few cars, for example, giving only one or two test cases for each sub-system. But you can test each sub-system (e.g. anti-lock brakes) in a test rig many more times.

Again, need to put some detail together about what are acceptable measurements. Do you accept John Westlake's menu of measurements for an amplifier? Have you ever seen such a set of measurements being published/produced/done on any amplifier? Where does this leave the measurements approach if this is what it takes but nobody is doing these set of measurements?

If it could be established that certain sources were a hundred times better than they needed to be to be audibly 'perfect', at least they could be 'written out of the equation'. Doing the same for amps into realistic speaker loads (essential) might also allow us to write them out of the equation, too. We could then concentrate on speakers which would be the fun part, it seems to me!

Again how would this "audibly perfect" measurement be established?

(But of course, I am ignoring the "euphonic distortion" aspect which says that some people actually prefer a non-perfect system. I am not yet convinced that this is not merely a preference of theirs from the selection of inherently compromised systems that they have heard, and that these same people wouldn't prefer a 'perfect' one if they heard it.)

Yes, it may be that certain distortions are more pleasing than others & preferred by some.

Groucho · Aug 17, 2013

jkeny said:
Ah, a nice, nebulous aspiration. Have you any examples of this being done in audio or detail? Saying "it should be possible" really doesn't give the aspiration any substance.Again, need to put some detail together about what are acceptable measurements. Do you accept John Westlake's menu of measurements for an amplifier? Have you ever seen such a set of measurements being published/produced/done on any amplifier? Where does this leave the measurements approach if this is what it takes but nobody is doing these set of measurements?

Again how would this "audibly perfect" measurement be established?

Well I'm specifically talking about existing psycho-acoustic models which are, presumably, based on physiology, listening tests and hopefully real science. So just as the lossy codec people really actually do use these models in a non-nebulous way to steer the design of their codecs, and to quantify the quality of them, we should be able to do the same with DACs and amplifiers - in effect regarding them as 'lossy codecs'. So no new listening tests needed, simply measurements and calculations. Given an adequate safety margin ('a factor of 100') it would be nice to think we could eliminate certain pieces of equipment from our enquiries.

jkeny · Aug 17, 2013

Groucho said:
Well I'm specifically talking about existing psycho-acoustic models which are, presumably, based on physiology, listening tests and hopefully real science. So just as the lossy codec people really actually do use these models in a non-nebulous way to steer the design of their codecs, and to quantify the quality of them, we should be able to do the same with DACs and amplifiers - in effect regarding them as 'lossy codecs'. So no new listening tests needed, simply measurements and calculations. Given an adequate safety margin ('a factor of 100') it would be nice to think we could eliminate certain pieces of equipment from our enquiries.

You mean the MUSHRA listening tests that the codec people use? Yes all for that (note to Tim, it includes positive & negative controls). But I thought you were suggesting measurements? Again you talk about measurements & calculations - what specific measurements exactly? Can you be a bit more specific, please - it seems all rather aspirational with "should" & "could" lavishly sprinkled throughout.

Groucho · Aug 17, 2013

jkeny said:
You mean the MUSHRA listening tests that the codec people use? Yes all for that (note to Tim, it includes positive & negative controls). But I thought you were suggesting measurements? Again you talk about measurements & calculations - what specific measurements exactly? Can you be a bit more specific, please - it seems all rather aspirational with "should" & "could" lavishly sprinkled throughout.

Well I'd like to avoid listening tests if possible, and base this on the science in the existing literature - where there have been listening tests in the past and from which psycho-acoustic models have been developed.

Within lossy codecs, the algorithm makes a decision as to how best to divide up its available bits based on an existing model of human hearing. For example, if it knows that there is a frequency bin with a large signal within it, it can reduce the precision with which it encodes adjacent, quieter bins and the listener will not hear it - or at least will not find it offensive. Well our DAC/amplifier testing could be on similar lines. If we see a distortion in the output, we can make an assessment based on existing models of human hearing as to how audible and/or offensive it will be.

If you doubt that such models already exist, then my suggestion is that progress will be made by developing these models (using listening tests if necessary), rather than attempting to use listening tests at a higher level. Restrict the use of listening tests to developing the model of human hearing, then use only automated testing on the DACs and amps.

Phelonious Ponk · Aug 17, 2013

jkeny said:
The concept of using positive & negative controls is exactly for that reason - to validate that the test is also capable of revealing what it purports to test. If you don;t understand that, then you don't understand what is a valid DBT & what isn't

JJ probably can tell you more on this - he has mentioned it plenty of times!

The concepts of pre-screening participants for hearing ability, including participants of a variety of backgrounds and listening experience, using a wide variety of media and audiophile and pro audio systems for playback (etc. etc....) and running hundreds of trials with dozens of subjects over the course of a year, well beyond the requirements of a statistically valid sample, exercises control, not only over the unlikely possibility that one of those high-end systems might not be able to differentiate between digital resolution rates, or that someone might be biased against hearing any differences, it exercises control over the unanticipated.

If you don't understand that, you probably shouldn't even enter into a discussion about research.

I don't know what controls Meyer and Moran had. But any control they put it to verify that differences were audible would have been deemed irrelevant to differences between redbook and SACD by all hi res believers. That's just the way it goes. Not to worry. It's just one study. A well-done study, but just one. All it says is that those people couldn't hear a difference on those systems playing that material. You can continue to believe in whatever you think you hear.

Tim

jkeny · Aug 17, 2013

Phelonious Ponk said:
The concepts of pre-screening participants for hearing ability, including participants of a variety of backgrounds and listening experience, using a wide variety of media and audiophile and pro audio systems for playback (etc. etc....) and running hundreds of trials with dozens of subjects over the course of a year, well beyond the requirements of a statistically valid sample, exercises control, not only over the unlikely possibility that one of those high-end systems might not be able to differentiate between digital resolution rates, or that someone might be biased against hearing any differences, it exercises control over the unanticipated.

If you don't understand that, you probably shouldn't even enter into a discussion about research.

Sorry but MUSHRA stands for MUltiple Stimuli with Hidden Reference and Anchor and is a methodology for subjective evaluation of audio quality, to evaluate the perceived quality of the output from lossy audio compression algorithms.. Why do you think they include hidden references & anchor? What you're talking about is simple pre-screening & statistical samples - it has nothing to do with in-built checks on the validity of the test itself - does the test actually reveal what it is expected to reveal - is it sensitive enough - is it administered in a way that doesn't skew it, etc. etc. Scientifically valid experiments need such a self- testing mechanism. Sorry if you took this up the wrong way as you appear to have done?

I don't know what controls Meyer and Moran had. But any control they put it to verify that differences were audible would have been deemed irrelevant to differences between redbook and SACD by all hi res believers. That's just the way it goes. Not to worry. It's just one study. A well-done study, but just one. All it says is that those people couldn't hear a difference on those systems playing that material. You can continue to believe in whatever you think you hear.

Tim

But without above self-testing controls how does anyone know if the equipment/set-up/administration etc. of the test was capable of revealing known differences?

Gregadd · Aug 17, 2013

The question remains " Why do our perfect measurements yield such imperfect results? Something is missing. From a measurement perspective, what is it and where did it go?
We can't hear it but we can measure it.
We can hear it but we can't measure it using our current technology.
"Those who seek the chaperone of science..."try to reconcile those two statements. (Assuming you find them at odds.)

rbbert · Aug 17, 2013

Phelonious Ponk said:
No, Meyer and Moran demonstrated that listeners could not consistently differentiate between Redbook and SACD resolution. Period. The addition of MP3 samples to Meyer and Moran would not have made it more meaningful, it would have made it a different study altogether. The introduction of a control, something subtle but clearly audible, may have revealed participants with bad hearing or a bias against hearing any audio differences at all, but that was covered by the depth and breadth of the study, the screening of the participants and their sheer numbers.

There are few studies of audio perception with enough depth and discipline to be meaningful, but there are a few. Yes, it would be nice if there were more.

Tim

Well, no. Meyer and Moran simply demonstrated that under the conditions of their test listeners could not consistently hear a difference between SACD and that SACD run through a filter making it 16/44.1 resolution. Really that's all it says, and while it isn't nothing it also isn't very much. Also, as mentioned many times previously, there was no attempt to quantify statistically the chance that M and M missed a real difference within the setting of their study (even apart from having no positive control).

The scientific illiteracy of Americans has been bemoaned in many places other than here.

Groucho · Aug 17, 2013

Gregadd said:
The question remains " Why do our perfect measurements yield such imperfect results? Something is missing. From a measurement perspective, what is it and where did it go?
We can't hear it but we can measure it.
We can hear it but we can't measure it using our current technology.
"Those who seek the chaperone of science..."try to reconcile those two statements. (Assuming you find them at odds.)

As has been pointed out, in sighted testing it is impossible to prevent people imagining differences that don't exist, and vice versa. This entirely explains the apparent discrepancy.

With this single observation, of course, the validity of 95% of all audio reviewing, advertising, manufacture etc. etc. evaporates. This seems absurd to most people. But the very existence of all that guff doesn't make it valid.

jkeny · Aug 17, 2013

Gregadd said:
The question remains " Why do our perfect measurements yield such imperfect results? Something is missing. From a measurement perspective, what is it and where did it go?
We can't hear it but we can measure it.
We can hear it but we can't measure it using our current technology.
"Those who seek the chaperone of science..."try to reconcile those two statements. (Assuming you find them at odds.)

I'm not sure sure that it is as absolute as "can't measure it" - I just think that the classical measurements used in audio are extremely limited in what they can tell us. Look at the suite of measurements that John Westlake suggested for characterisation of an amplifier (note that these are mostly dynamic, not static tests). I imagine that neither the interest nor the money is available to do such a range of tests on amplifiers & hence we only ever see a partial series of tests. So it may not be that it "can't be measured" - it may well be that the interest/resources are not available to do "proper" suite of measurements & hence we interpret this state of affairs as measurements not being capable of correlating with what we hear. However, I do feel that even with the full set of measurements more work would be needed to correlate measurements with hearing in order to reach a point where measurements have some predictive capabilities & can tell us how something may well sound.

Phelonious Ponk · Aug 17, 2013

I know what a control is, John. I understand the concept of testing audibility within the systems to make sure it is possible for the subject material to be audible. And as I already said, I don't know what controls M&M did or did not use. Do you? Or are you just looking for reasons to dismiss the study without further investigation? And what stimuli would you suggest to put in place to test whether or not a system would be able to reveal audible differences that are theoretically and measurably inaudible and which all exist above the known threshold of human hearing? None, I'm sure, that could not be dismissed by anyone who wants to believe in any subsequent conclusions they dislike.

Here this is not the study itself and it doesn't address your specific question, but it's a pretty good overview. I'm sure you'll find plenty to argue with here http://mixonline.com/recording/mixing/audio_emperors_new_sampling/

Tim

jkeny · Aug 17, 2013

Phelonious Ponk said:
I know what a control is, John. I understand the concept of testing audibility within the systems to make sure it is possible for the subject material to be audible. And as I already said, I don't know what controls M&M did or did not use. Do you? Or are you just looking for reasons to dismiss the study without further investigation? And what stimuli would you suggest to put in place to test whether or not a system would be able to reveal audible differences that are theoretically and measurably inaudible and which all exist above the known threshold of human hearing? None, I'm sure, that could not be dismissed by anyone who wants to believe in any subsequent conclusions they dislike.

Here this is not the study itself and it doesn't address your specific question, but it's a pretty good overview. I'm sure you'll find plenty to argue with here http://mixonline.com/recording/mixing/audio_emperors_new_sampling/

Tim

Tim, you have cited M & M as a valid, well-run test & yet don't seem able to deal with questions about it's procedure & validity. Instead you attribute any such questions as attempts at dismissing the study. Well, if the study isn't rigorous then it should be dismissed & certainly not cited as well run or valid. So if you don't know what if any controls were used how can you stand over the test?

If you're going to run a test then run it properly - An excerpt from the "METHODS FOR THE SUBJECTIVE ASSESSMENT OF SMALL IMPAIRMENTS IN AUDIO SYSTEMS INCLUDING MULTICHANNEL SOUND SYSTEMS"
"It must be empirically and statistically shown that any failure to find differences among systems is not due to
experimental insensitivity because of poor choices of audio material, or any other weak aspects of the experiment, before
a “null” finding can be accepted as valid. In the extreme case where several or all systems are found to be fully
transparent, then it may be necessary to program special trials with low or medium anchors for the explicit purpose of
examining subject expertise (see Appendix 1).
These anchors must be known, (e.g. from previous research), to be detectable to expert listeners but not to inexpert
listeners. These anchors are introduced as test items to check not only for listener expertise but also for the sensitivity of
all other aspects of the experimental situation.
If these anchors, either embedded unpredictably within the context of apparently transparent items or else in a separate
test, are correctly identified by all listeners in a standard test method (§ 3 of this Annex) by applying the statistical
considerations outlined in Appendix 1, this may be used as evidence that the listener’s expertise was acceptable and that
there were no sensitivity problems in other aspects of the experimental situation. In this case, then, findings of apparent
transparency by these listeners is evidence for “true transparency”, for items or systems where those listeners cannot
differentiate coded from uncoded version"

rbbert · Aug 17, 2013

Phelonious Ponk said:
And as I already said, I don't know what controls M&M did or did not use. Do you? Or are you just looking for reasons to dismiss the study without further investigation?
Tim

Are you suggesting they used controls not mentioned in the study itself or any of the subsequent discussions? Because there are no positive or negative controls mentioned anywhere, just the listeners themselves.

These problems with testing limitations exist not only in audio, but in essentially any area with pretentions to be "scientific". Do you have any idea of how many medical treatments appear useful in well done tests (and receive FDA or similar approvals) but fail in widespread use? For example, are you aware that no drug treatment is associated with improved survival in cardiac arrest?

jkeny · Aug 17, 2013

rbbert said:
...These problems with testing limitations exist not only in audio, but in essentially any area with pretentions to be "scientific". .......

The pretentions to be "scientific" seem to be more rampant on audio forums, however!

JackD201 · Aug 17, 2013

"All our knowledge has its origins in our perceptions."

Leonardo da Vinci

Phelonious Ponk · Aug 17, 2013

jkeny said:
Tim, you have cited M & M as a valid, well-run test & yet don't seem able to deal with questions about it's procedure & validity. Instead you attribute any such questions as attempts at dismissing the study. Well, if the study isn't rigorous then it should be dismissed & certainly not cited as well run or valid. So if you don't know what if any controls were used how can you stand over the test?

If you're going to run a test then run it properly - An excerpt from the "METHODS FOR THE SUBJECTIVE ASSESSMENT OF SMALL IMPAIRMENTS IN AUDIO SYSTEMS INCLUDING MULTICHANNEL SOUND SYSTEMS"
"It must be empirically and statistically shown that any failure to find differences among systems is not due to
experimental insensitivity because of poor choices of audio material, or any other weak aspects of the experiment, before
a “null” finding can be accepted as valid. In the extreme case where several or all systems are found to be fully
transparent, then it may be necessary to program special trials with low or medium anchors for the explicit purpose of
examining subject expertise (see Appendix 1).
These anchors must be known, (e.g. from previous research), to be detectable to expert listeners but not to inexpert
listeners. These anchors are introduced as test items to check not only for listener expertise but also for the sensitivity of
all other aspects of the experimental situation.
If these anchors, either embedded unpredictably within the context of apparently transparent items or else in a separate
test, are correctly identified by all listeners in a standard test method (§ 3 of this Annex) by applying the statistical
considerations outlined in Appendix 1, this may be used as evidence that the listener’s expertise was acceptable and that
there were no sensitivity problems in other aspects of the experimental situation. In this case, then, findings of apparent
transparency by these listeners is evidence for “true transparency”, for items or systems where those listeners cannot
differentiate coded from uncoded version"

It's a reasonable point theoretically, John, and perhaps I'm not following you and/or your academic source, so let me break it down:

To determine if the systems, material or hearing of the subjects are sensitive enough to detect a difference between Redbook and SACD, you would insert a sound at random that is known to be detectible to expert listeners, then see if they detect it in the test. Have I got that right? And if the system/material/test is transparent enough to reproduce the anchor audibly, we assume they will be good enough to reveal differences between CD and SACD? Yes?

Let's not even fool with what is and is not "expert" to whom, right now. We agree that they are expert listeners. They detect an "anchor" that is barely audible, but they are still unable to consistently ID the difference between CD and SACD. If the control wasn't at least in the same range as the difference between CD and SACD, have you really learned anything? And the difference between CD and SACD is above 20k. What control, known to audible to expert listeners, but falling outside of the known range of human hearing, would you have recommended that M&M use?

Tim

jkeny · Aug 17, 2013

tomelex said:
Originally Posted by tomelex
......
Summing up, there is nothing in an audio "signal" that we can not measure well past what any ear can hear. What happens when our ears get ahold of it is preference.

...

It seems such a simple statement that it's easy to agree with but again are these measurements being done in a comprehensive way? If not then it really doesn't matter - it's just theoretical. If the reality is that it isn't being done in any comprehensive way then it is immaterial & of no use in a practical evaluation of audio quality for everyman. Thats why most people use subjective listening for their evaluation of audio - the measurements just aren't there in a comprehensive way.

rbbert · Aug 17, 2013

JackD201 said:
"All our knowledge has its origins in our perceptions."

Leonardo da Vinci

Perhaps a good place to remind ourselves of two axioms of current scientific thought:

1) We can't know everything.

2) Some things we "know" are wrong.

These are particular true in biological systems

jkeny · Aug 17, 2013

Phelonious Ponk said:
It's a reasonable point theoretically, John, and perhaps I'm not following you and/or your academic source, so let me break it down:

The source is here http://www.itu.int/rec/R-REC-BS.1116-1-199710-I/e The MUSHRA methodology is recommended for assessing "intermediate audio quality". For very small audio impairments, Recommendation ITU-R BS.1116-1 (ABC/HR) is recommended instead.

To determine if the systems, material or hearing of the subjects are sensitive enough to detect a difference between Redbook and SACD, you would insert a sound at random that is known to be detectible to expert listeners, then see if they detect it in the test. Have I got that right? And if the system/material/test is transparent enough to reproduce the anchor audibly, we assume they will be good enough to reveal differences between CD and SACD? Yes?

Partly, the anchor is randomly distributed through the listening tests & a statistical evaluation done on it's results which can then be used to evaluate the sensitivity & validity of the whole test including listeners, material & procedures.

Let's not even fool with what is and is not "expert" to whom, right now. We agree that they are expert listeners. They detect an "anchor" that is barely audible, but they are still unable to consistently ID the difference between CD and SACD. If the control wasn't at least in the same range as the difference between CD and SACD, have you really learned anything?

No, you haven't & the test is invalid

And the difference between CD and SACD is above 20k.

Is it the only difference?

What control, known to audible to expert listeners, but falling outside of the known range of human hearing, would you have recommended that M&M use?

I didn't design the test so why ask me? Doesn't stop me asking questions about it's validity though. Test are either valid or not - if it's not possible to use appropriate controls then the test is not valid, is it?

Tim[/QUOTE]

NorthStar · Aug 17, 2013

It is indeed a great thread, with intelligent discussions of objective and subjective information. :b

Objectivist or Subjectivist? Give Me a Break

Industry Expert, Member Sponsor

New Member

Industry Expert, Member Sponsor

New Member

New Member

Industry Expert, Member Sponsor

WBF Founding Member

Well-Known Member

New Member

Industry Expert, Member Sponsor

New Member

Industry Expert, Member Sponsor

Well-Known Member

Industry Expert, Member Sponsor

WBF Founding Member

New Member

Industry Expert, Member Sponsor

Well-Known Member

Industry Expert, Member Sponsor

Member

Similar threads