You are here

DAVID GRIESINGER (LEXICON): Creating Reverb Algorithms For Surround Sound

Interview | Manufacturer By Paul White
Published March 2000

David Griesinger.David Griesinger.

The current challenge to reverb designers such as Dave Griesinger of Lexicon is to create realistic algorithms for surround sound. Paul White finds out how he's approached the problem.

Lexicon's reputation for high‑quality reverb effects is in no small measure down to Dr David Griesinger, the man behind their closely guarded reverb algorithms. I have met David on several occasions and get the impression that his life is one big R&D project, where ultimate reverb comes just slightly ahead of the Holy Grail in terms of importance!

With the increasing popularity of surround sound in the cinema, TV, and in the near future, music, Lexicon clearly had to address the problem of creating surround reverb. To this end, they've developed a brand‑new high‑end reverb processor known as the 960 (the full launch of which is expected at the AES show in February) and a surround card for the well‑established Lexicon 480, a project David is overseeing personally. I managed to arrange a brief meeting with him at Lexicon's Boston headquarters where we spoke about his strategies for creating the aural illusion of space.

Hearing In Surround

DAVID GRIESINGER (LEXICON): Creating Reverb Algorithms For Surround Sound

How far do you have to modify your ideas to produce a reverb algorithm that will work for surround sound as opposed to basic stereo?

"Well, a lot of things have changed in the last 15 years, and one of them is that I have gone in a very different direction. Instead of asking what happens in a real room — looking at the reflections and transfer properties — I've asked the question 'How do we perceive real rooms?' I've moved from a room‑based model to a perception‑based model. What I've tried to do is understand what's audible to humans and what is not, and how that interacts with how we perceive both speech and music.

"I've not tried to generate real rooms but patterns of reflections and reverberations that do what is optimum perceptually, whether a room could do that or not. What I'm interested in is how you can generate a reflection pattern — you may not even want to call it a reflection pattern any more — which gives you the properties of music that are desirable and not the properties that are undesirable, if you can identify which those are."

Intelligibility, Distance & Envelopment

Lexicon's forthcoming 960L reverb processor, with its LARC remote‑control unit.Lexicon's forthcoming 960L reverb processor, with its LARC remote‑control unit.

When it comes to creating the impression of a believable surround room, what have you identified as the factors that matter most?

"I divided the perception issues into three areas. Firstly, there's the whole issue of intelligibility and what influences intelligibility; what makes something hard to understand as opposed to easy to understand, both musically and for the spoken word. Then there's an effect you might class as distance, and another effect which has to do with envelopment, or the sense of reverberation. It turns out that those perceptions are influenced very much by different time ranges in the reflected energy.

"You have the sense of distance, which is influenced by nearly any time range — reflected energy coming in nearly any time period will cause a feeling that you are at some distance from the sound source. However, if you have a sound source that's like speech, or certain types of music, you'll find that you can make a recording that sounds both too close and too far away at the same time. This is a very easy thing to do if you have a lot of late energy and not much early energy. You get this in a recording environment by close‑miking everybody and then adding other mics further back in the hall to pick up the room sound. If you try that, you'll find that the instruments sound like they're in the loudspeakers, and you may or may not have a good hall around you depending on how you miked it. Adding the hall doesn't move the instruments out of the loudspeakers.

"How do you get the instruments out of the loudspeakers? It turns out that adding earlier energy does that pretty well and adding energy over a wider range can do that also. However, looking at the way people perceive energy, if it ends up hitting you before 50mS elapses after the end of the original sound, human perception treats this very differently from if the energy happens after that period. If a reflection happens after 50mS, you can perceive it as a separate thing — perhaps as an echo, depending on the nature of the source. You can also localise it in space, so if you have a reflection coming in at say 150mS, and it's coming from the left side, you'll hear it as coming from the left. That's not true if it comes in the first 50mS.

"If it happens in that first 50mS, the apparent direction of the reflection is greatly influenced by the direction of the source. The reflection may come from the left and somewhat behind you, and it may appear as though it's coming from the left, but this time in front of you. On top of that, anything coming before 50mS is not usually perceivable as a separate sound event, although if you have very short percussive sounds such as rim shots, clicks or handclaps, you may still perceive the reflection separately. For speech and music, you can't separate the reflection from the original sound, which is an advantage for intelligibility. Generally, those reflections coming within the first 50mS don't affect intelligibility.

"Here's where you get into something that's actually very useful, because if you have an early‑reflection pattern that's different in every speaker of a surround setup — the centre speaker is a special case, so we'll say every speaker except the centre — it generates the feeling that you're in an acoustic space. Instead of having the voice in the speaker, it appears to be somewhat behind the speaker. You feel you're in a natural space, but the space has no identifiable size. All you know is that it's a space; it might be a small space, but what's important is that the voice has moved out of the speaker."

So what do you feed to the centre speaker?

"For the 480 surround card, the centre signal is derived from a sum of the left and right rear channels, ideally each attenuated by about 7dB and then summed, so the total energy is about 4dB lower than the energy in each of the other speakers. You could throw in a little delay — 15mS or so, if you want to gild the lily."

The reflection patterns coming from the different speakers, other than the front, don't have to be correlated in any way?

If you have an early‑reflection pattern that's different in every speaker of a surround setup, it generates the feeling that you're in an acoustic space.

"I'm still using a randomisation technique very similar to what I was using 15 years ago in the 480, which ensures the pattern is always changing, and it's always changing in each speaker in a different way. This is not perceivable, particularly if these reflections happen before 50mS, because your ability to understand where they are coming from is severely inhibited. You don't know where the reflections are coming from, all you know is that you no longer have a close‑miked sound, and this is very desirable. It doesn't affect intelligibility, it just puts in something that was missing. Now you can add more conventional reverb, like the 480, and if you put that so it's similar in all four speakers, but highly uncoloured, you can generate the feeling that you're not close miked in a good hall. But again, there's no effect on intelligibility if you design the profile of the later reverb to avoid the range between say 50mS and 150mS."

Intelligibility Versus Realism

DAVID GRIESINGER (LEXICON): Creating Reverb Algorithms For Surround Sound

Can't you simply pull down the level of the early reflections in that problem area?

"Obviously you can't do that in a real room, and it's actually quite hard to make algorithms that do that electronically. The profile of the initial part of the reverberation is extremely important, and as stated earlier, if you make it exponential from the very beginning, you have way too much energy in the time range where you don't want it. What's more, by the time you get to the point where you hear reverberation as envelopment, it's decayed considerably so you have to add a lot of the part you don't want to get enough energy in the part where you do want it. In that sense, the shape of the reverberation is very important, which is why we've arrived at a situation where the energy level is fairly flat out to a time range that's adjustable, after which the energy decays.

"Between 50mS and 120mS is probably the worst possible time to get energy from an intelligibility point of view. However, you can't leave a hole there, otherwise you hear the reverberation starting up again afterwards, but you can keep the energy flat in that region out to over 160mS rather than having an exponential decay. You can play around with that time, and the longer that 'flat' part extends, the larger the hall seems to be.

If you want natural surround reverberation, it needs to be the same in all speakers but incoherent.

"You make the very early part strong enough to create the sense of distance from the loudspeakers. This strong early energy extends to about 50mS after the sound ends. Then the reflected energy should be lower in level and flat as a function of time till about 150mS, and then finally start to decay exponentially. The difference in total level can be surprisingly large: in many cases, the reflected energy before 50mS may be two or three times greater than the total energy after 50mS.

"The strong very early energy gives you distance without changing intelligibility. Since the level in the 50 to 150mS range is relatively low, the intelligibility stays good, and the relatively high level after 160mS gives you excellent reverberance and envelopment.

"The human hearing system is very insensitive to energy in the region between 50mS and 150mS. That time range seems to be inhibited, except for the effects on intelligibility, which are all negative. It's not a positive time range in terms of reverb, and if you look at most rooms, especially underdamped recording studios, and small auditoriums holding under 1000 people, they all tend to concentrate their energy in that time range. We're all used to that, and we accept it, but it has a number of consequences. If the room is fairly reverberant, that is to say over a second and a half of reverb time or thereabouts, intelligibility will be compromised. If you look at small halls with a reverb time of less than 1.2 seconds, then the strong reverberation in the 50 to 150mS range has a very severe effect on timbre, with everyone sounding as though they're playing through woollen blankets. But you can hear every note, because the decay time is quick enough that it doesn't smear notes."

Speaker Interaction

The current flagship
processor in Lexicon's high‑end lineup is the 480L, for which Dave Griesinger is currently developing a surround card.The current flagship processor in Lexicon's high‑end lineup is the 480L, for which Dave Griesinger is currently developing a surround card.

In surround applications, is it necessary to have any degree of interaction between the late reverb components of the sound fed to the four speakers, or can they be generated completely independently?

"Reverberation is well known to be almost completely uniform in typical halls over those kind of time ranges. No matter where you are in the hall, after a couple of hundred milliseconds, the reverberation tends to be diffuse — it's completely uniform all around you. In a surround sound application, you need to have the reverberation coming to the various speakers incoherent, so there's no correlation and no auto‑correlation over a fairly wide time range. One way of getting incoherence between speakers is simply to delay the sound, and if you were to add a delay of, say, 10mS to one speaker, a phase meter would tell you it was incoherent. However, you don't hear it as incoherent, as the ear auto‑correlates the signal over time. What you need then is incoherent sound in each speaker of roughly the same level and frequency response, if you want to emulate a hall. Now, you can do fancy things by changing timbre, and maybe even the reverb time depending on where you are going around the speakers, and this is very interesting. You can do some wonderful things, but it's not particularly natural. If you want natural surround reverberation, it needs to be the same in all speakers but incoherent. That's not necessarily true in the early‑reflection part of the reverb, and in the 480L surround card, I have the early reflections in front speakers at full bandwidth while the early reflections fed to the rear speakers generally use a reduced bandwidth and are further delayed. However, you can do the experiment by flipping back and forth, swapping the front and rear channels to see if the difference is audible. I haven't been able to hear it. Basically then, you can make it more 'room‑like' by doing this, but it is not clear that anyone hears the difference."

If everything is so uncorrelated between the front and the rear, is there any reason why you shouldn't use a couple of stereo reverbs loaded with complementary programs and use one to treat the front pair of speakers and one the rear pair?

"If you look at the DSP requirements, it's actually easier to make a surround reverb than two independent stereo ones, because certain parts of the algorithm would be combined. In theory you could use two different stereo reverbs, but it's slightly more efficient to do it altogether, and it's certainly easier to use.

The human hearing system is very insensitive to energy in the region between 50mS and 150mS. That time range seems to be inhibited, except for the effects on intelligibility, which are all negative. It's not a positive time range in terms of reverb...

"As I said earlier, the early reflections are not audible as such, yet a lot of people are making virtual rooms by calculating the reflections of reasonably small rooms. If the reflection time is under 50mS, that is to say generated by a notional surface around 25 feet from the person listening, you're not going to hear where that reflection came from. Your ability to perceive the size and shape of a small room is very limited. You can do it in a statistical kind of way, and I'd expect a blind person who'd had some training to determine whether a room was long and narrow or square or rectangular, because by turning his head he'd be able to determine whether the major reflection density was lateral or medial. But he'd be doing it by listening to a kind of bulk energy phenomenon. He wouldn't be listening to specific reflections."

Does that mean that instead of creating discrete reflections within that first 50mS period, you could just generate a single reflection with the appropriate coloration?

"Single reflections are dangerous because you can find sound sources where they become audible. One of my favourite recording venues was a very long, narrow church with about 25 feet between the side walls. You could make wonderful recordings in there using a simple microphone pair and still get the feeling that they weren't close‑miked, because if you calculated the strength of that first lateral reflection from the side walls, it was just about the right level and it was a single reflection from a stone wall. But if somebody got out castanets or that kind of percussion, you'd be in trouble, because you'd start hearing the single reflections and they'd become disturbing. But on a vocal chorus, it was wonderful. In a live venue, that single reflection is slightly different for every member of the chorus, which is why it sounds wonderful — there's virtually no comb filtering, because every person has a different reflection pattern. But if you tried to use that with a reverb unit that had a single input, you'd have the same reflection on everybody, and it's inevitable that you'd get comb filtering. That's why we use multiple reflections and it's why we move them around. Multiple reflections also concentrate energy over a wider time range and generally give you less coloration."

Moving Targets

When I last spoke to David Griesinger, we were discussing the role of early reflections in real rooms. Most reverb units create their reflections using what is in effect a multitapped delay line, so every signal fed into the reverb unit creates the same pattern of early reflections. He was saying that in a real room, however, the reflection spacing changes depending on where the performer is located in that room. This means that the simple model produces an early‑reflection pattern that always seems to originate from the same point in space, regardless of the actual stereo positioning of the instruments:

"I think now I'd rather say that in a real room, every sound source has a different timbre, because every pattern of reflection from the viewpoint of a single listener is different for each sound source. Also, typical live musicians will move to some extent, which will tend to change the reflection pattern slightly. Every sound source creates a different pattern of reflections and that pattern is often not stationary. I haven't analysed how much the reflection pattern is modulated in a real room, though that wouldn't be very difficult to do. There are limits as to how fast you can modulate things because then you get into obvious pitch‑change problems, and most real musicians don't move fast enough to create pitching problems unless they really want to — for example the jazz trumpeter moving to create that wonderful Doppler shift."

So how do you create the effect of moving or positioning a sound source in a virtual surround room?

"Primarily, if you're hearing motion at all, it's motion of the direct sound. That would be done by four‑speaker panning. I don't think you need to modify the early reflections as the sound is moved, as the effect is probably not audible. I think the illusion could be made very convincing without having to change the early‑reflection patterns at all. You really don't ever want to end up in a situation where a single reflection comes out of a single loudspeaker. If you do this, you're making the inherent assumption that the listener is always in the sweet spot of the loudspeaker array — but in most cases, the listener is not in the sweet spot, and if somebody is close to a loudspeaker that's being fed a strong reflection, they'll be very startled by this and it will sound very unnatural.

"Unless you have loudspeaker arrays that can generate plane waves rather than point sources, you have the problem that every time you move closer to a loudspeaker, you get the 'one over r squared' change in level. If you have large planar arrays with multiple drivers, you can generate something very close to a plane wave and the sound doesn't get louder as you move closer to the speakers. It can sound really wonderful, but that's not the way home theatre and cinema sound systems work. So, I think you have to modify what you'd like to do in terms of reality to take into account that you want to make the effect plausible over a wide listening area."