You are here

Q. Why don’t we place monitor speakers either side of the head, like headphones?

The traditional stereo monitoring setup, with two speakers and the listener’s head at the points of an equilateral triangle, is important for stereo image stability, as it recreates the inter‑aural time differences that allow us to locate a sound source’s position.The traditional stereo monitoring setup, with two speakers and the listener’s head at the points of an equilateral triangle, is important for stereo image stability, as it recreates the inter‑aural time differences that allow us to locate a sound source’s position.

I am planning a move of my home studio to a new room and have been thinking about layout and where to put everything. Of course the speaker location matters, but I was wondering why we put speakers in the classic triangle with the listening point at the apex? Surely putting them at 90 degrees to the ear, much like a huge pair of headphones, makes more sense — perfect separation and all that?

SOS Forum post

SOS Technical Editor Hugh Robjohns replies: It dates back to the early 1930s, and the pioneering work of Alan Blumlein at EMI. He worked out that an arrangement with the speakers at ±30‑degree angles from the forward axis of a listener, in an equilateral triangle with the listener at the apex, was the ideal configuration for creating the illusion of directional sound information and the ‘stereo image’. This illusion relies on the ears and brain being fooled into converting inter‑channel level differences between the loudspeaker outputs into ‘fake’ time‑of‑arrival information at the ears.

To get a handle on this process, consider what happens when a continuous tone is replayed over both loudspeakers and heard by both ears of a listener (in fact, tones are hard to locate in real life due to their lack of transients, but the process physics are still valid and the diagrams are easier to draw and comprehend!). Clearly, the path length from the right speaker to the left ear is slightly longer than that from the left loudspeaker to the left ear, so sounds from the right speaker will arrive slightly later (by a known and fixed amount due to the speaker angles). The sounds from both loudspeakers will combine acoustically as they enter the ear to create a new composite sound wave — and it is these composite sound waves that matter.

If the two loudspeakers produce equally loud sounds, the resultant combinations at both ears are obviously identical, and each ear receives a composite tone which appears to start slightly after the direct sound from the nearer speaker but earlier than that from the farther speaker. (See diagram above.)

The composite sound waves arriving in each ear have the same amplitude and, critically, the same apparent arrival time at both ears. Since there is no apparent time‑of‑arrival difference at the ears, the brain interprets the sound as coming from a ‘phantom image’ midway between the two loudspeakers. (For a sound to arrive with zero time‑of‑arrival difference, it must be located somewhere on the median line of the head, and since we know sounds come from loudspeakers, the brain assumes a source midway between them.)

Now, consider what happens if the left‑hand loudspeaker produces a tone which is louder than, but otherwise identical to, that from the right‑hand loudspeaker. In this situation, the resultant combination of sounds at the left ear generates a composite sound wave which is identical in level to the composite received at the right ear, but now those left‑ and right‑channel level differences result in an artificial time‑of‑arrival difference. The louder signal from the left speaker effectively pulls the apparent start time earlier for the left ear, and later for the right ear, and this is interpreted by the human hearing system as a single sound source located closer to the left loudspeaker.

Consequently, altering the relative output levels of the two channels (eg. with a pan pot) creates different artificial time‑of‑arrival differences at the ears when listening to speakers in the equilateral triangle arrangement, and thus we can position sounds anywhere between the two speakers in the stereo image.

To maintain the illusion of a stable stereo sound stage it is essential that the listener is in the correct position relative to the loudspeakers...

Having discovered this phenomena, Blumlein went on to develop the concept of coincident microphone arrays to capture real sound source positions in real life in a way that would translate accurately to loudspeaker listening. Since the left and right mics in a coincident array occupy the same point in space, there can be no timing differences between their signals, but their polar patterns will impose level differences dependent on the angle of incidence of the sound waves. Those level differences get translated back into artificial time‑of‑arrival differences via the speakers, and thus create a stable stereo image.

However, to maintain the illusion of a stable stereo sound stage it is essential that the listener is in the correct position relative to the loudspeakers: at the apex of the equilateral triangle. If the listener moves well off to one side, for example, the stereo image quickly collapses into the nearer loudspeaker. This is due to something called the Haas Effect, whereby once the time‑of‑arrival difference at the ears exceeds a certain amount, the brain becomes fixated on the sound which arrives first, and ignores later‑arriving sounds.

That’s why moving the speakers to ±90 degrees can’t create a stable stereo image. It’s also related to the reason the stereo image heard over headphones lacks stability and feels unnatural to most people. Each ear under headphones hears only its own channel, and as there is no inherent ‘bleed’ from the other channel the artificial time‑of‑arrival differences are not reconstructed: sounds therefore tend to ‘puddle’ at each ear rather than stretch across a stable image. The ‘crossfeed’ option included in some headphone amps and software attempts to reintroduce the opposite channel bleed to recreate time‑of‑arrival differences, but the success is dependent to some extent on how well the crossfeed parameters match those of your own head dimensions.