You are here

Early Sound Scattering & Control Room Design

Exploration
Published January 1997

How can adding randomness to your monitoring improve its accuracy? Acoustic designer Andrew Parry explains how applying Early Sound Scattering (ESS) design principles can help to make studio control rooms of different dimensions sound subjectively similar.

When taking a brief for a control room design, acousticians are often told, "I don't need anything fancy, I just want to be able to rely on what I'm hearing". It doesn't matter whether the budget is a couple of hundred or a couple of hundred grand, the requirement stays the same; it's just the degree of precision that changes.

Whether the design is for a shoestring home setup or a fully‑featured commercial studioplex, the primary requirement is one of monitoring accuracy, with all the other requirements following on after that (you don't, for instance, need high levels of isolation if you're happy to work entirely on cans).

Of course, it's never quite that simple. You probably also want the room to be comfortable enough to work eight hours at a stretch, big enough to fit the whole band in for the final mix, able to have its equipment changed without altering the monitoring accuracy, sound the same in all parts of the room, and provide mixes which sound the same when you take them to another studio, take them home, or play them in the car.

Over the years, various attempts have been made to design control rooms which tell the absolute truth about the music. Consensus now seems to be that, in fact, there is no such thing as absolute truth, because what you hear is always going to depend on the environment in which you listen to it. So instead, room designers seek to create 'neutral' rooms, which impose as little as possible of their own character on a sound, whilst still providing a viable working environment for the engineer.

All sorts of clever (and some not so clever) tricks have been used in the quest for neutrality, and Early Sound Scattering looks as though it might be the next big step to achieving it. But first...

A Potted History Of Control Room Design

  • THE '60s: DEAD ROOMS

In the beginning, there was rock and roll. And perfboard and rockwool. And the producer said "Let there be the direct sound, and nog‑all else." And lo, it got the job done. But it sounded horrible, and engineers could only work in the studios for 20 minutes at a stretch, because they were almost anechoic, and the human animal can't cope with that. Well, OK, maybe it wasn't that bad, but you get the picture.

  • THE '70s: RETTINGER & EASTLAKE

Stereo happened, and people started getting interested in what was actually going on in control rooms. The best had rough stone front walls with the monitors set flush into them, and very deep absorption at the back. The front side‑walls and ceiling were raked to prevent flutter echoes. The hard front end provided the occupants with a few reflections, giving them some acoustical perspective, so it didn't feel like they had their heads in boxes of cotton wool. Typical decay times varied from about 380ms for a room of 100 cubic metres, to 430ms for 200 cubic metres. But no two rooms sounded quite alike, nor did any two places in the same room. They tried equalisers, and made them look the same on an analyser, but still they sounded different.

  • THE '80s: LEDE AND REFLECTION CONTROL

Live End Dead End (LEDE) was all the rage in the '80s. By making the area at the front of the room almost anechoic, American studio designer Don Davis and others opened up a new realm of realism in studio monitoring. The secret lay in the initial time gap between the direct sound and the first reflections; make this long enough and the brain can separate off the room acoustic, and ignore it. Result: a truly neutral room, where what you hear is exactly the same in any other LEDE room. To preserve operator comfort, the rear wall had to be hard, but not cause a slapback echo. Based on numerical theory by Manfred Schroeder, new diffuser treatments were developed (particularly by Peter D'Antonio at American company RPG Systems) to break up the echo, but still return the energy to the room as a short decay. For a fuller explanation of these, see the 'Schroeder Diffusers' box elsewhere in this article.

Reflection control was basically the same concept as LEDE, and was developed as a solution to the conflicting requirements of having a completely absorbent front end and the usual need for a studio window in the front of the room. By dint of careful geometry, you can arrange for the reflections off the glass to miss the mix position, giving the illusion of complete absorption. With even more careful geometry, you can have many reflections from the front surfaces of the room, all of them missing the mix position, forming an RFZ (reflection‑free zone). For the zone to remain reflection‑free, the rest of the room needs to be anechoic, or at least highly absorbent. The effect is stunning, provided you sit in exactly the right place, nobody puts a rack of keyboards behind you, and you don't want any effects racks, tape machines or anything else in the rear half of your control room. LEDE and RFZ rooms all seek to achieve essentially the same objective: a room which imposes none of its own character upon the signal. They do this primarily by not allowing any of the early reflections to reach the engineer's ears. This poses a problem when you want to put any kit in the room, because you unavoidably get reflections off it which the room designer wasn't expecting.

Enter Early Sound Scattering

One logical alternative to the LEDE/RFZ approach is to build a room in which the characteristic reflections are so uniformly random that they have no character to impose. The ESS control room is one which features a highly diffusive front end (including the walls into which the monitors are built), which scatters the early sound. The body of the room is absorbent, with most of the low‑frequency absorption provided by damped membrane panels. The room can be made fairly live compared to older control rooms, with a flat frequency response and good stereo imaging, both of which remain stable right to the rear corners of the room.

The concept of surrounding the monitors with diffusers was an invention born of necessity, when a large Amek 9098 desk was fitted at Lisa Stansfield's fairly small Rochdale studio. RFZ geometry just wouldn't work in this case — the desk would have had to go too close to the speakers — so an alternative was needed. The original purpose of all the diffusers was to produce enough early energy to mask the reflection from the large desk, and so reduce comb‑filter effects. At worst, the results could have turned out equal to the best rooms of the '70s, but with the advantage of having the erratically random stone replaced by statistically perfectly random diffusers. The same approach was taken to avoid geometry problems when Lisa Stansfield's home studio was constructed in Dublin. When both studios were completed, it was found that material could be transferred between the two with complete confidence, despite the two rooms being radically different in shape and size.

So, How Does It Work?

  • STEREO IMAGING

A common assumption about diffusion is that, by smearing the signal in time as well as space, the stereo image is bound to be destroyed utterly. This, however, has turned out not to be the case. A stereo image is a psycho‑acoustic illusion: a trick played on the brain and ears. The ears gather whatever information they can, and the brain makes whatever sense it can of the information. When the information is conflicting, the brain fails to make sense of it, and the illusion is lost. The information of most interest to the brain is the level difference between left and right ears, but timing is also very important. If the timing information conflicts with the level information, the image disappears. Reflections assist the brain in localising a sound source, but that is not the aim when trying to form a stereo image. Scrambling the timing information makes it more difficult to localise the loudspeaker itself, leaving the level information, uncontradicted, to provide the image. The resulting image, while not quite as dramatic as that found in a well set‑up RFZ room, is reliable regardless of changes of equipment in the rear of the room, and extends the full width of the desk and right to the back wall.

  • FREQUENCY RESPONSE

The most readily‑grasped measure of a control room's 'quality' is its steady‑state frequency response, as shown on a spectrum analyser with a pink noise signal source.

Although popular in the late '70s, the use of equalisers to compensate for room acoustics is now generally frowned upon, except in certain circumstances. In particular, if you flush‑mount speakers which were designed to be free‑standing, a bass lift will result, because the speaker is radiating the same power into a hemispherical space which it ought to be radiating omnidirectionally. In this instance, a bass cut may be applied in the feed to the amplifier; a steady‑state remedy for a steady‑state anomaly. Using a graphic equaliser for this task is unwise, as each filter causes all sorts of phase shifts at its turnover points, causing a loss of definition at the bottom end. A simple 'bypassed pad' first‑order bass cut causes the least possible disturbance to the phase at low frequency.

Provided your speakers have been built right, the steady‑state frequency response of your system depends mainly on the room's decay time response. To achieve a flat frequency response, the decay time of the room must be approximately equal in each octave band. Equal decay times may be achieved at mid and high frequencies by specifying suitable absorbent treatments for the walls and ceiling. Typical absorbers in this frequency range include foam tiles, drapes and soft furnishings, and mineral or glass fibre matting up to 200mm thick. Deep trapping, Helmholtz and membrane absorbers and resonant pipes may be used to control low‑frequency decay, but because low‑frequency propagation is primarily by excitation of room resonances, close attention must also be paid to the shape of the room. The room proportions (the ratios of height to width to length) should closely approach one of Bolt's ideal ratios (worked out in the middle of this century), which distribute the resonances evenly with respect to frequency (for a fuller explanation of resonances and details of Bolt's ratios, see the 'Exciting Rooms' box elsewhere in this article). In non‑rectangular rooms, the averaged dimensions should still be made to fit one of Bolt's ideal ratios, as the non‑rectangularity will essentially only damp the resonances, and not eliminate them.

Comb filtering is the effect where a delayed signal cancels with the direct signal at frequencies where the path difference is an odd number of half wavelengths. The depth of the cancellation notch depends on the difference in level between the two signals, with complete cancellation if they are exactly equal. The effect on the sound varies throughout the room, because the extra distance the reflected ray travels varies. By spreading the reflection out in time, Schroeder diffusers close to the loudspeakers provide a highly effective method of minimising the effect. Where the primary reflection is from a diffuse surface, the reflection will be markedly reduced in level, as the energy is being dispersed in many directions, so much smaller cancellations will be produced. However, because the spatial diffusion is accompanied by temporal diffusion, the notches are damped to the point of non‑existence. If the primary reflection is not from a diffuse surface, it will be being fed from the diffuse area at the speaker, with much the same effect.

Examining the impulse response of an ESS room compared with a similarly‑dimensioned RFZ room reveals that the inevitable desk reflection has changed from a tall spike into a squat hump. This translates into the frequency domain as exchanging deep, narrow notches in the high‑frequency region, of up to about 15dB depth, for about 2dB of gentle ripple. The improvement in high‑frequency phase coherence that removing the deep notches provides is hard to quantify, but hi‑fi buff words like clarity, naturalness and transparency spring to mind.

  • SPATIAL UNIFORMITY

The use of loudspeakers with a highly hemispherical output is central to the ESS design, in order that sufficient energy is delivered onto the diffusers close to them. This, in turn, means that off‑axis listeners will receive a very similar direct sound spectrum to those on‑axis. This, of course, is nothing new, and soft‑dome speakers have been increasingly popular since the early '80s.

However, anyone who has ever studied any physics knows that two point sources in phase produce fringing effects, which will cause a room to have a different frequency response at every point in space. This is basically the same problem as the comb filter effect described above, except now we're talking about two sources and a spatial anomaly, rather than one source, a reflection, and a frequency domain anomaly. If you haven't ever noticed this, try listening to some 1kHz tone in mono on two speakers, and move your listening position from side to side by a foot or so. The level changes dramatically, as does the apparent direction as you move through the fringes, or hot spots. The big difference with the ESS room is that this fringing is almost completely absent. The diffusers close to the speakers effectively convert the speakers to large plane sources, which do not suffer from the same constructive and destructive interference effects, removing the biggest obstacle to achieving consistency of frequency response throughout the room. Also, the imaging benefits from this removal of hot spots, because the level differences at the ears are more likely to resemble those at the speakers. Many control rooms also exhibit a nasty bass lift close to the back wall. In any closed space, close to the boundaries, you get a rise in level at low frequency due to the pressure zone effect. The use of damped membrane absorbers, especially on the rear wall, where the effect is most pronounced, minimises this problem. The mathematics of why this works is beyond the scope of this article, but concerns the phase shift at which the membrane re‑radiates the energy it fails to absorb.

  • DECAY TIME

The decay time in the control room greatly affects the comfort of the engineer, and too short a decay can cause fatigue after quite a short time. In 1977, Michael Rettinger, in his AES paper On the Acoustics of Control Rooms, determined that the perceived liveness of a room depends upon the ratio of the decay time to the room volume, and suggested an ideal relationship for a recording control room. Since then, probably due to increasing awareness of the need for engineer comfort, rooms have tended to be built to be slightly more live than this, and ESS rooms are normally designed to have a decay about 20% longer than that suggested by Rettinger. When calculating control room decay times, Sabine's simple formula (see the 'Sabine, Eyring, and Fitzroy' box elsewhere in this article for more on this) is inadequate, as the room is both highly absorbent and non‑uniformly covered. Accordingly, Eyring's formula resolved along three axes after Fitzroy is recommended.

  • REPEATABILITY

The biggest factor which makes reflection‑control rooms different from each other, given that the designer intended them to sound identical, is the assortment of other kit that ends up in the room. If the accuracy of the room relies upon freedom from early reflections, one reflection from behind the engineer makes a huge difference to the overall sound, and variations in position or size of the racks, trollies, and keyboard stands will cause no end of variation in the room acoustic. If, instead, these extraneous arrivals are just a minute part of a cloud of diffuse arrivals, the effect of changing them, within reasonable limits, is negligible, and therefore two quite different room layouts can sound almost identical.

Home Studio Applications

Most home studios are not quite so grand as those shown in the photos accompanying this article, so you may well be wondering how all this affects you, and whether your studio can actually benefit from early sound scattering at all. If you're working with anything more than purely nearfield monitoring, the answer is yes. Even in rectangular rooms with free‑standing or bracket‑mounted speakers, the addition of areas of diffuser on the side walls, and on the front wall if possible, will produce many desirable early reflections which will help to mask the room character. The most important frequencies to scatter are in the 1‑5kHz region, and dimensions for such a diffuser are given in Figure 1. The diffusers should be applied for a couple of feet either side of each primary reflection point, and if you're not sure where that is, you probably ought to be getting an acoustician to do the geometry for you.

So, there you have it. Add enough smooth randomness to any imperfect system, and the imperfections virtually disappear. These rooms really work, and give a good representation of what your mix will sound like away from the studio. They're pleasant to work in, and can be tailored to suit even quite modest construction budgets without greatly compromising performance.

About The Author

The author has been involved in professional audio since the early '80s. After a few years of running and maintaining PA rigs of various shapes and sizes, he started building custom gadgets and systems for stage and studio use, whilst also offering maintenance services to small studios. He ran the technical department of a well‑known studio equipment retailer for five years, while studying acoustics in his spare time. Since 1990 he has been an independent studio consultant, designing, installing and maintaining studios. He would welcome correspondence on the ideas presented in this article — letters to The Editor at the usual SOS address.

Schroeder Diffusers

A Schroeder diffuser is a structure comprising a number of wells of different, carefully‑chosen depths. As a ray of sound strikes the irregular surface, instead of bouncing off it like a mirror, it bounces out of each well at a slightly different time. The result is many small reflections, spread out in both time and space. The frequencies at which such structures operate as diffusers depend upon their dimensions, with the lower limit being that frequency where the deepest well is a quarter‑wavelength, and the upper limit being where the period of the structure is equal to half a wavelength. The operating range of a single diffuser is limited to about four octaves, because if the deepest well is deeper than about fifteen times its width, it begins to behave as a diaphragmatic absorber. The way it actually works is a bit complicated, but here goes.

Any wavefront travelling in a particular direction may be considered as being made up of an infinite number of side‑by‑side omnidirectional 'secondary wavelets'. The direction of propagation of the wave depends on the spatial arrangement of the notional sources of these wavelets, or on their phase relationship (same thing, really). If a wave is reflected by a Schroeder diffuser, each well produces a reflection at a slightly different time, due to its different depth. The phasing of these reflected wavelets is what determines the direction of the reflected wave, and if the diffuser is correctly designed, the reflected wave will depart in many directions. (A theoretical diffuser having infinite wells would reflect the wave in a perfect hemidisc.) The wells are arranged in a cyclic sequence, and the best sequences consist of a prime number of wells per cycle.

A number of ways of determining the well depths has been tried over the years, but by far the most popular is the quadratic residue sequence. (I tried to get away without using the words 'quadratic' and 'residue', but I just couldn't help it). To quote Dr Peter D'Antonio, these sequences 'have the unique property that the Fourier transform of the exponentiated sequence values has constant magnitude in the diffraction directions'. The well depths are given by the following equation:

Here, d is the depth, h is the well number, N is the prime number on which the sequence is based, and L is the wavelength of the lowest operating frequency.

Exciting Rooms: Room Shape

Low frequencies, from about 200Hz down in typical control rooms, behave very differently from higher ones. High frequencies travel like a light ray: in a straight line from the speaker to your ear. With low frequencies, the speaker dumps energy into the room, exciting the room's natural resonances, and it is these resonances that then couple into your ears. If the room shape is such that its resonances, or modes, are all bunched together, then at some frequencies there will be a big lift in what you hear, while at others, where the room does not respond, there will be a big dip. The modes of a room come in three flavours, axial, tangential, and oblique. Axial modes occur, as their name suggests, along the axes of the room; front to back, side to side, and floor to ceiling. They are the easy ones to predict the frequency of, because they occur at all multiples of the frequency at which the length, width or height of the room is half a wavelength. Tangential modes are a bit harder to calculate, as they take in any two pairs of opposite surfaces, and oblique modes are even worse, making the grand tour of all six surfaces. If you really need to know the frequency of a particular mode, use the following equation:

Here, f is in Hz, c, the speed of sound, is about 344m/s, L is the room dimension and n is the order of the mode. The point of all this is that, unless your room is a really bad shape, you're not actually all that interested in the frequencies of all these modes, only in how evenly spread out they are. If they're poorly spread out, then where they clump together, the room will show a response peak, and low levels at other frequencies. In smallish rooms, the region which tends to suffer the most in this way is from about 50Hz to 150Hz, right where you need the most reliable response for mixing. The maths is too complicated to go into here, but if the ratios of height to width to length (in any order) are 1.14:1.39:1 or 1.28:1.54:1 or 1.60:2.33:1 (Bolt's golden ratios), then the modes will be perfectly spaced, and low‑frequency response is pretty much guaranteed to be smooth.

State Of Decay: Sabine, Eyring, & Fitzroy

The simplest way to predict the decay time of a room is by using the equation formulated in the early years of this century by the physicist WC Sabine:

Here, T is the RT60 decay time, V is the volume of the room in cubic metres, and A is the total absorption in the room, in metric Sabins. This is fine for predicting decay times in fairly reverberant spaces, where the absorption is evenly distributed and the average absorption co‑efficient is no more than about 0.2. You just add up all the areas of absorber multiplied by their co‑efficients to get the value for A, and out pops your answer. In large spaces, A should also include an allowance for absorption by the air, which depends on temperature, frequency, and relative humidity.

CF Eyring followed up Sabine's work, improving upon his formula to make it applicable to less reverberant spaces, by treating the waves as though they were being absorbed only at the surfaces.

Another physicist, Fitzroy, also later improved upon Sabine by allowing the absorbent material to be distributed unevenly:

Here, Sx, Sy and Sz are the areas of absorber projected onto the three axes of the room. Since control rooms tend to be fairly dead and non‑uniform, it seems logical to replace the Sabine expression in each term of Fitzroy's formula with Eyring's, and this has been found to produce results which correspond well with measured values. The large quantity of sums involved in calculating decay times at octave centres suggests the use of either a spreadsheet or a dedicated computer program to enable a design‑by‑trial process.