Comparative Listening Tests

Ron Resnick · Apr 20, 2017

In a post on Al M.’s thread “ZenWave Audio D4 Interconnect” Peter A., describing a comparative listening test at Al’s house, wrote:

. . . Despite your good descriptions of the differences between the D4 and Monster cables, I remain somewhat confused by the events of the recent audition. You essentially conducted an A/B/X test for me and David. Despite me being completely sure that the X cable was the same as B, I failed that test. That opened my eyes to the realities of such tests. Then, later in the evening, when we did the test again with different music, you switched the test without telling us. It was basically an A/B/C test, because you rotated the Tube Traps when we were outside of the room for the X part, changing it into C. That only confused me more.

. . .

Finally, I'd like to add a few thoughts about the A/B/X testing method. Not knowing what I was hearing made for a very intense listening session. During the A/B part, obviously we knew there were two different cables being heard, and we simply described the differences, if any, between the two. That part was fairly easy. The test changed when we heard the third X part. We did not know if you switched cables again or not. Now we were being asked to remember the sound of the previous two cables while listening to unfamiliar music and identify if the third cable was the same as the first or the second. I found this test to be fundamentally different from the first test. We were not listening for differences, but instead testing our recall ability. I thought I correctly identified it, but was clearly wrong. That knowledge of failing the test then influenced my listening for the rest of the evening and contributed to an overall very confusing listening session. I don't know much about blind testing. The fact that I failed it and that you are clearly certain which cable you prefer means either that this kind of testing is somehow flawed, . . . or that bias may enter into your decision, or something else entirely. I don't really know.

I think it is important to attempt to be intellectually honest. I want to commend Peter for being intellectually honest about a confusing situation in which he found himself with the A/B and A/B/X testing of Al’s interconnect cables.

While I have never met Peter in person, we have corresponded via e-mail and talked on the telephone extensively and frequently for several years now. I know Peter to be an extremely thoughtful, detailed-oriented, careful and conscientious listener. Also, importantly, Peter (unlike me) regularly listens to live music. I am confident Peter's ears are better, and more accurate, than mine. So, after reading Peter’s post, I ask myself: “If Peter gets confused in a test like this, what do we, individually and collectively, even think we are doing when we compare products and listen for changes in our systems?”

I have no answers, only questions. There are a lot of opinions about the merits and problems of A/B and A/B/X testing, and about the pros and cons of short-duration A/B and A/B/X comparisons versus long duration, spend-weeks-with-a-product listening with no back-and-forth comparison.

I think Peter’s experience inclines me to view even more skeptically long-duration "comparisons." As Peter wrote, the comparison, even in the short-duration time-frame of an A/B/X test, became a test of “recall ability,” not “listening for differences.” So what realistic hope does an audiophile have trying to remember how his system used to sound after he has been listening to a new component for weeks or even months?

Are we partially, or even completely, deluding ourselves when an audiophile visits a friend’s house to listen to music, and then returns weeks later to see if the audiophile can hear a difference wrought by some change the friend made in his system? (Let's not even think about the audiophile's differences in mood, restfulness, alcohol consumption, hunger level, stress, etc., between the two listening sessions.) After listening again weeks later, the audiophile reports hearing a significant difference in response to a minor tweak in his friend’s system (“the soundstage opened up significantly,” “the midrange glare is much less evident,” the noise floor is much lower,” "the highs are more extended," etc.). But if we can't even make reliable comparisons during the course of a single day how can we be possibly think that we can remember accurately what our friend’s system sounded like weeks ago? (Of course, critics of A/B and A/B/X tests argue there are problems and inherently confusing issues with short-term comparison tests which are solved by long-duration auditioning.)

I am not suggesting we stop auditioning components, stop pursuing tweaks or stop listening for significant or subtle differences. I am suggesting that perhaps we should be more realistic and circumspect -- and more skeptical -- about our expressed conclusions. We should attempt to do the best we can do, and to try to remain as intellectually honest as possible, but perhaps we should acknowledge that we may be fooling ourselves about some of our listening conclusions.

RogerD · Apr 20, 2017

In the end it is a extremely personal experience. I have found I focus only on one marker and that is clarity,otherwise it becomes a crap shoot. Ymmv

Joe Whip · Apr 20, 2017

Great post Ron. I have been involved in several AB and ABX listening tests. They can be quite stressful, usually as a result of our own doing. I prefer to use music which I am very familiar with. Usually 2 minutes per track is enough. There are times when the differences are obvious. Most times, they are very close. In those cases, I take the approach that if I really have to think really hard whether I hear a difference or think I do but I am not sure, I simply am not hearing a difference. Trying to recall auditory memory is a difficult task as you note.

Al M. · Apr 20, 2017

Joe Whip said:
Great post Ron. I have been involved in several AB and ABX listening tests. They can be quite stressful, usually as a result of our own doing.

I agree that blind AB and ABX listening tests can induce psychological stress, as well as expectations, which can be detrimental to our cognitive abilities (I should have remembered that in the above test). Perhaps the best way may be to listen for some extended time (an hour or more) with one component in a stress-free manner, and then switch back to the old component for a while to assess if there is any meaningful difference. And repeat. It may take some time to figure out where the differences really are, and if they are real, you'll eventually lock into them and be easily able to confirm that they are there. Once you think you have firmly figured out a difference that is meaningful to you, then a blind test to confirm it might make more sense -- if it's even needed at that point.

microstrip · Apr 20, 2017

Ron Resnick said:
(...) I am not suggesting we stop auditioning components, stop pursuing tweaks or stop listening for significant or subtle differences. I am suggesting that perhaps we should be more realistic and circumspect -- and more skeptical -- about our expressed conclusions. We should attempt to do the best we can do, and to try to remain as intellectually honest as possible, but perhaps we should acknowledge that we may be fooling ourselves about some of our listening conclusions.

IMHO skepticism should be on the side of the reader. I hope people feel free enough to express their opinions without fear that they are wrong. Opinions in the case you address are mainly driven by preference and clearly detailed. Also IMHO the described tests are meaningless.

Preference is the trigger (and perhaps the reason) of hyperbolic comments in the high-end. As long it is clearly expressed I can live with them and appreciate them.

Ron Resnick · Apr 20, 2017

Al M. said:
. . . Perhaps the best way may be to listen for some extended time (an hour or more) with one component in a stress-free manner, and then switch back to the old component for a while to assess if there is any meaningful difference. And repeat. . . .

This makes sense to me.

twitch · Apr 20, 2017

Ron, an intelligent post for sure. The powers of psychoacoustics combined with so many 'audiophiles' trying out for Editor in Chief of TAS with regards to their over the top reviews has had me laughing for years.

Your last paragraph sums it up well !

bonzo75 · Apr 20, 2017

microstrip said:
IMHO skepticism should be on the side of the reader. I hope people feel free enough to express their opinions without fear that they are wrong. Opinions in the case you address are mainly driven by preference and clearly detailed. Also IMHO the described tests are meaningless.

Preference is the trigger (and perhaps the reason) of hyperbolic comments in the high-end. As long it is clearly expressed I can live with them and appreciate them.

I disagree, many times someone else's experience is dismissed as his preference. While in reality he would have actually compared to establish a preference

microstrip · Apr 20, 2017

bonzo75 said:
I disagree, many times someone else's experience is dismissed as his preference. While in reality he would have actually compared to establish a preference

Can't understand what you mean exactly with your second sentence.

Folsom · Apr 20, 2017

There's no end to exploiting this horse's remains...

First off I would say some people have better and worse memory, and general abilities. But one's abilities for memory aren't the bigger factor. You need to make mental notes, or written notes, on the things you notice. If you have those then you can compare them to the new experience. If you never tried to discern anything while listening, it's going to be much harder to know the difference when you arrive.

As far as blind testing the irony in this might be that humans cannot do it without a form of attachment. I'm under a strong impression after so many years that the key to hearing differences in audio isn't recall but association. First off if you listen to a stereo for say several weeks and then change something, you'll know for sure whether it was you who changed it and whether or not you were informed. You might not believe it, but you will notice and wonder.

During ABX, blind, whatever, tests the biggest issue may be that you can't form any expectations for the sound you're hearing. This is because you have no association. My suggestion is you assign colored lights or cards or something with the different components you're swapping. The person being tested hasn't a clue what color corresponds to what component. But here's the factor, either one component has two colors, or one color has two components. That way the person being tested can try to discern between say yellow Pass amp or yellow Mulla amp noticing that yellow is inconsistent while say Blue is always the same NAD amp; or they can note that two colors sound the same. You simply have a small questionaire that they can fill out.

pkane · Apr 20, 2017

Ron Resnick said:
I think Peter’s experience inclines me to view even more skeptically long-duration "comparisons."

I've been assembling and comparing audio components for well over 20 years. Long ago, I've come to believe that only fast, blind switching between two components using short, well-known audio tracks (usually, much shorter than a minute) is how I best detect differences.

While I like the idea of a long-running evaluation, say an hour or longer, I find that I can't reliably pick the components this way. Especially if the differences are minor as would be the case with different high-end interconnects.

Folsom · Apr 20, 2017

Peter, that's how I have to do it for choosing electronic parts. They all measure the same or within a fraction of significance, and all sound different.

stehno · Apr 20, 2017

Ron Resnick said:
In a post on Al M.’s thread “ZenWave Audio D4 Interconnect” Peter A., describing a comparative listening test at Al’s house, wrote:

. . .

I think it is important to attempt to be intellectually honest. I want to commend Peter for being intellectually honest about a confusing situation in which he found himself with the A/B and A/B/X testing of Al’s interconnect cables.

While I have never met Peter in person, we have corresponded via e-mail and talked on the telephone extensively and frequently for several years now. I know Peter to be an extremely thoughtful, detailed-oriented, careful and conscientious listener. Also, importantly, Peter (unlike me) regularly listens to live music. I am confident Peter's ears are better, and more accurate, than mine. So, after reading Peter’s post, I ask myself: “If Peter gets confused in a test like this, what do we, individually and collectively, even think we are doing when we compare products and listen for changes in our systems?”

I have no answers, only questions. There are a lot of opinions about the merits and problems of A/B and A/B/X testing, and about the pros and cons of short-duration A/B and A/B/X comparisons versus long duration, spend-weeks-with-a-product listening with no back-and-forth comparison.

I think Peter’s experience inclines me to view even more skeptically long-duration "comparisons." As Peter wrote, the comparison, even in the short-duration time-frame of an A/B/X test, became a test of “recall ability,” not “listening for differences.” So what realistic hope does an audiophile have trying to remember how his system used to sound after he has been listening to a new component for weeks or even months?

Are we partially, or even completely, deluding ourselves when an audiophile visits a friend’s house to listen to music, and then returns weeks later to see if the audiophile can hear a difference wrought by some change the friend made in his system? (Let's not even think about the audiophile's differences in mood, restfulness, alcohol consumption, hunger level, stress, etc., between the two listening sessions.) After listening again weeks later, the audiophile reports hearing a significant difference in response to a minor tweak in his friend’s system (“the soundstage opened up significantly,” “the midrange glare is much less evident,” the noise floor is much lower,” "the highs are more extended," etc.). But if we can't even make reliable comparisons during the course of a single day how can we be possibly think that we can remember accurately what our friend’s system sounded like weeks ago? (Of course, critics of A/B and A/B/X tests argue there are problems and inherently confusing issues with short-term comparison tests which are solved by long-duration auditioning.)

I am not suggesting we stop auditioning components, stop pursuing tweaks or stop listening for significant or subtle differences. I am suggesting that perhaps we should be more realistic and circumspect -- and more skeptical -- about our expressed conclusions. We should attempt to do the best we can do, and to try to remain as intellectually honest as possible, but perhaps we should acknowledge that we may be fooling ourselves about some of our listening conclusions.

First, there is no substitution for developing one's listening skills. The less developed, the more everything sounds the same. Especially at some shootout.

Even though A/B comparisons / shootouts might make for good opportinties to socialize but if a product's real performance is paramount to the event, most of time it's of little value. For at least the following reasons:

1. For most attendee's the reference material is foreign. Every attendee has varying degrees of listening skills. If the attendee with the most well-trained ears has to lean forward and listen via his imaginary stethoscope, you can bet dollars-to-donuts those with less trained ears will be out of luck.

2. As mentioned in other threads, most / all electrical objects including wires, cables, line conditioners, etc require a certain warm-up period before performing at their full burned-in status. Though most such objects may be fully warmed up within an hour or so, there are others that require 4 hours or maybe even 2 full days before performing at their full potential. To make matters worse, few to some electrical objects can sound unusually horrible until they reach their full burn-in status after install.

Case-in-point: Years ago a friend with well-trained ears visited and I had some ic's he was familar with already in place and he was pleased with what he heard. Then I showed him a pair of very musical ic's I received from a new cable manufacturer / friend and said, wait until you hear these. With that I installed the ic's and we listened. They sounded just shy of horrible. And before the second track was finished, my buddy asked, "does your friend know what he's doing?" With that, we let the system play and went out to dinner. Upon our return about 90 minutes later, he couldn't believe how fabulous they sounded and upon his return home, he order about 3 pair for his own system.

3. For every movement within a given system, a cable, a component, a speaker, a rack, etc. every last one of these objects goes thru some form of mechanic settling-in process and the time required to reach full mechanical settling-in status can vary from minutes to months and in very rare cases to years.

PeterA · Apr 20, 2017

Interesting topic, Ron. Reflecting upon that evening of testing the two 1 M ICs, I now realize why my and our friend's input was helpful to Al. When comparing the two cables, my friend and I both preferred one to the other, both early on with unfamiliar music, and then later on with more familiar music. This was consistent during the A/B testing. I don't know if warm up times effected the outcome as Stehno suggests it might, but we were both able to describe the differences between the cables and why we preferred one to the other. Al heard things differently as he preferred the slightly brighter sound of the neutral cable and reflective strip which I thought seemed to homogenize the frequency range, and make things a bit bland. I think Al liked the increased resolution of the new cable. Looking back on it now, I think we were listening for different things. We all described the sound in similar ways, but we weighed particular aspects of the sound differently.

I believe that our comments led Al to rotate his Tube Traps so that the reflective strip would not face the listener. The combination of the reflective strip facing the listeners and the more neutral tonal balance and increased detail of one cable resulted in an overly bright sound. I preferred the slightly darker sounding cable because I thought it was better balanced with the room acoustic. However, once Al rotated the Tube Traps so that the absorptive side faced the listener, then the more neutral cable allowed the system to interact with the room better and the result was a more balanced, natural sound with greater resolution. We were hearing less reflection of high frequencies and more absorption of lower frequencies with the Tube Traps rotated. This resulted in less room sound, and more direct sound from the system, which benefited from the more neutral tonal balance and increased resolution of the first cable.

So, in the end, I am confident that the evening's listening session helped to sort out the differences between the two cables and resulted in a clear preference for Al. I don't know what it really says about the testing methodology, but the process was helpful to Al, which is why our friend and I were invited over in the first place.

One more comment: It is possible that Al had an expectation bias for the new cable which he was auditioning because he had heard it previously in a different system, liked it, and now was trying it in his own system. He knew all along which cable was which, so his listening was sighted, while his guests were truly blind to which cable they were hearing. It is interesting that Al initially disagreed with us, but then once he rotated the Traps and the sound changed again, we all agreed that he had struck the best combination, and it was with his new cable which the guests did not realize until after the testing was complete and the cables were revealed.

Few of us audiophiles have the skills or ability to do truly controlled listening tests of gear we have for audition. We get together and do the best we can. The tests are flawed, and we are flawed, but we have fun and often do make progress. In the end, we do what we can to try to improve our systems, hopefully in an absolute and objective sense, but usually it is just in a relative sense, based on our subjective preferences. It is a crazy hobby.

Ron Resnick · Apr 21, 2017

stehno said:
. . . Case-in-point: Years ago a friend with well-trained ears visited and I had some ic's he was familar with already in place and he was pleased with what he heard. Then I showed him a pair of very musical ic's I received from a new cable manufacturer / friend and said, wait until you hear these. With that I installed the ic's and we listened. They sounded just shy of horrible. And before the second track was finished, my buddy asked, "does your friend know what he's doing?" With that, we let the system play and went out to dinner. Upon our return about 90 minutes later, he couldn't believe how fabulous they sounded and upon his return home, he order about 3 pair for his own system.

. . .

How many alcoholic beverages were consumed, and how many bonding, heart-warming stories were shared, during the 90 minutes at dinner?

bonzo75 · Apr 21, 2017

microstrip said:
Can't understand what you mean exactly with your second sentence.

If you compare your AR and CJ and prefer, say, CJ, an AR owner might dismiss it as a preference for CJ, while he might have the same preference, but has not had the experience of comparing the two like you did. We can discuss preference only after we have had the same experience. If you and I sit down together and listen to a Koetsu and an ortofon, agree on what we heard, but then you prefer one cartridge, me the other, that's preference.

Folsom · Apr 21, 2017

Ron Resnick said:
How many alcoholic beverages were consumed, and how many bonding, heart-warming stories were shared, during the 90 minutes at dinner?

This wasn't what sold Steve on his CMS racks, was it?

stehno · Apr 21, 2017

Folsom said:
This wasn't what sold Steve on his CMS racks, was it?

He couldn't consume that much alcohol.... or was it that many stories?

DaveyF · Apr 21, 2017

An interesting question, Ron. Are we sometimes fooling ourselves with what we hear---or believe that is what we hear.
I recently acquired a pair of Shakti Hallograph room acoustic treatments. I installed these as per the manufacturer's direction and IMO they have made a very nice improvement to my already well treated dedicated room.
I then made the HUGE mistake of posting my findings and thoughts on the audioscience forum. LOL, if I wasn't nearly flamed to death in 30 seconds!! Luckily, I had my heavy duty flame suit on and so managed to escape the vitriol...barely.
You see, looking at the flimsy looking wooden Shakti's, it is virtually impossible to conclude that they could do anything for the treatment of room acoustics! Yet, in my experience they did all that they are purported to do and they brought a superior overall sound to my system. ( In my room).
Most/All of the posters over there believed that what I was hearing was made up in my head.
So, after a lot of second guessing on my part, I went back and took the Shakti's out of the system, listened at length and then put them back and listened at length again. Nope, same impression as before....am I deluding myself???
These things work in my room and with my system....others may not think so, just like many people do not believe that changing cables can make any difference. Expectation bias is typically what they will bring to the argument...and perhaps they are right....
Nah, not in my books.

KlausR. · Apr 21, 2017

Peter,

PeterA said:
Few of us audiophiles have the skills or ability to do truly controlled listening tests of gear we have for audition. We get together and do the best we can. The tests are flawed, and we are flawed, but we have fun and often do make progress. In the end, we do what we can to try to improve our systems, hopefully in an absolute and objective sense, but usually it is just in a relative sense, based on our subjective preferences. It is a crazy hobby.

Google books has Meilgaard’s “Sensory Evaluation Techniques”, check out Chapter 6.

The easiest test mentioned is the paired comparison test, where all possible pairs have to be presented, or the same/different test, with matched and unmatched pairs. As long as the listeners are told to listen for and note differences between the two components of a pair without knowing to what kind of pair (AA, AB, BB, BA) they are listening to that specific parameter is controlled. Use one listener at a time, control SPL at listening position. There are more potential sources of bias but for a start the above would probably do. If listeners want more time to listen, give them more time.

I think that such simple tests can be done by the layman, but probably the real audiophile is not interested in knowing the objective truth. If he hears a difference and considers that this difference is worth the money, then that’s fine for him. It is most likely that there is a real physiological reaction too, one more reason to go for the fancy cable:

http://www.pnas.org/content/105/3/1050.full

I have no problems with others using sighted listening but I certainly don’t consider opinions as being facts.

Klaus

Comparative Listening Tests

Site Co-Owner, Administrator

VIP/Donor

Well-Known Member

VIP/Donor

VIP/Donor

Site Co-Owner, Administrator

Well-Known Member

Member Sponsor

VIP/Donor

VIP/Donor

New Member

VIP/Donor

Well-Known Member

Well-Known Member

Site Co-Owner, Administrator

Member Sponsor

VIP/Donor

Well-Known Member

Well-Known Member

Well-Known Member

Similar threads