You are here

Digital Audio

Frequently Asked Questions By Paul White
Published August 2000

The now‑discontinued Alesis AI‑1 was one of very few affordable stand‑alone sample‑rate converters.The now‑discontinued Alesis AI‑1 was one of very few affordable stand‑alone sample‑rate converters.

Confused by word clock? When is it best to normalise a signal? Are special 'digital cables' a con? Paul White answers these and other questions we're most often asked about digital audio.

The ability to record music and connect equipment digitally should, in theory, provide excellent sound quality with a minimum of hassle, and none of the problems with noise and unreliability that can plague analogue systems. However, life isn't always so simple, and we receive many questions from readers who are having difficulty making their digital setups work properly. In this feature I'll try to answer some of the most common causes of digital head‑scratching...

Q. If digital audio is just a matter of storing and retrieving numbers, how is it that some systems sound better than others?

A properly designed master clock such as the Aardvark Aardsync will provide a stable source of sync information for all your other digital gear, eliminating the need to daisy‑chain other equipment.A properly designed master clock such as the Aardvark Aardsync will provide a stable source of sync information for all your other digital gear, eliminating the need to daisy‑chain other equipment.

It's true that digital audio data is simply a stream of numbers, but they must be the right values, and delivered at exactly the right time, if the original audio signal is to be adequately represented. The weakest link in the chain is probably the analogue‑to‑digital converter or 'A to D', the job of which is to sample the incoming audio signal and convert it into a stream of numbers. The part of the circuit that does the measuring is actually a very refined digital voltmeter, and if this doesn't measure the signal accurately, then the numbers will be wrong and the result will be distortion.

The measurements have also to be made at precisely regular intervals (the sample rate), but poor A‑to‑D design, bad circuit board layout, poor clock design and other factors can result in something we call jitter. Jitter occurs when the sampling period varies slightly rather than being constant, the result being that the audio waveform is sampled just before or just after the correct time, again resulting in the wrong value being stored.

Finally, the sampling theorem according to Nyquist states that for the accurate reconstruction of a signal, the sampling rate must be equal to or greater than twice the highest‑frequency component of the signal being sampled. We can only hear up to 20kHz or so, so a sampling rate of 44.1kHz or 48kHz is adequate to reproduce the audible component of any source signal. However, the source signal being sampled may well contain frequencies above 20kHz — the harmonics produced by plucked metal strings, bells or cymbals can go much higher than that — and since these cannot be accurately reproduced by a digital system operating at 44.1 or 48kHz, they must be removed before sampling takes place, or they will yield audible artifacts through a process known as 'aliasing'. Various filtering strategies are used to remove these frequencies, all of which involve the construction of a very steep low‑pass filter, but no filter is perfect and the steeper the response the more the filter tends to 'ring' or resonate at the cutoff point. This ringing can impart a harsh quality to the sound, a fault much in evidence in first‑generation CD players of the early 1980s, but modern converters are much better in this respect thanks to technologies such as oversampling, single‑bit converters and improved filter design.

Q. Why might I need a sample‑rate converter?

Sample‑rate converters (SRCs) are necessary to convert one sample rate to another within the digital domain when passing signals between different pieces of equipment. For example, a recording made on a 48kHz sampling‑rate DAT recorder needs to be converted to 44.1kHz for CD production. Most sample‑rate converters are asynchronous, which simply means that incoming clock rate doesn't have to be related to the output clock rate. Even if the input sample rate drifts up or down slightly, the output of a properly designed asynchronous sample‑rate converter will be constant.

The mathematics involved in designing a good SRC algorithm are more complex than you might initially imagine, and different designs vary in quality. For example, a poor design may cause additional low‑level frequency components to be present in the output signal that were never there in the original. Though the level of these artifacts is generally low, there can be an audible effect on overall audio quality and stereo imaging.

In a system where several independent digital sources need to be mixed, asynchronous SRCs are often used to bring them all into alignment with a common clock, usually that of the digital mixer. Even if all the sources are at the correct sample rate, the fact that each piece of equipment has its own crystal clock means that the outputs will all drift slightly relative to each other, but by processing these signals via a number of SRCs controlled from a common clock, the data streams can be synchronised.

If you need to sample‑rate convert but don't have access to a sample‑rate converter (either hardware or as part of a software package), you can still connect the two pieces of equipment via their analogue connections. In theory this will cause some signal degradation, and you have to take care to match the levels, but in practice, the quality drop is usually minimal, and often less than would occur using a poorly designed sample‑rate converter.

Q. Why do I have to buy special cables to use with digital S/Pdif connections? Is this just a con to get US to spend more money when audio phono cables would work fine?

The reason cheap audio cables don't work properly is that digital audio data takes up a much greater bandwidth than analogue audio. At the high frequencies involved, impedance mismatches reflect back some of the signal along the cable and these 'echoes' compromise the signal‑to‑noise ratio of the system such that at some point, zeros may be misread as ones or vice versa. To do the job properly, 75Ω digital cable is required as this minimises reflections and maintains the signal's integrity.

Q. Are ADAT optical cables compatible with S/Pdif optical cables?

While both these systems use similar cabling and the same transmission and reception hardware, their data formats are, in fact, very different. So there's no point in feeding a stereo S/PDIF signal into an ADAT interface port and hoping the signal will appear on two of the channels. You need a special box to do this job.

Q. Do optical cables differ in quality the same way as co‑axial cables?

Yes they do. There are two areas where optical signals can be degraded — at the termination and in the cable itself. The optical quality of the cable, or lightpipe as it's sometimes called, determines the distance that the signal will be able to travel before errors become a problem. The terminations affect how efficiently light enters or leaves the cable. If you need cable lengths of more than a couple of metres, it's worth investing in high‑quality optical cables rather than using the budget ones that come with most pieces of digital audio equipment.

Q. What's the difference between S/Pdif and Aes‑Ebu digital connections other than the type of connector?

Both are stereo audio formats and both transmit the audio data in the same form, though S/PDIF is unbalanced and AES‑EBU is balanced. However, additional information is carried along with the audio and S/PDIF carries some consumer data that AES‑EBU does not. For example, S/PDIF carries data relating to the SCMS copy protection system, and can also carry DAT and CD track IDs. AES‑EBU does not recognise this data, which can be both an advantage and a disadvantage. It's an advantage when you want to work on material that has SCMS copy flags set to prevent copying, as these are ignored by the AES‑EBU interface. On the other hand, you may want to use DAT IDs to trigger automatic track ID creation in a CD writer, but these will also be stripped out if you use the AES‑EBU connection.

Another difference is that the nominal operating level of S/PDIF is around half a Volt whereas AES‑EBU is around 10 times greater at five Volts. Special adaptor cables are available to convert AES‑EBU to S/PDIF, allowing the audio data to be transferred between formats.

Q. What is word clock and do I need it?

Normally the sync information for a digital signal is embedded in the data stream, but it's also possible to use an external master clock to synchronise the various pieces of equipment in a digital audio system. As the name implies, the master clock takes over the sync functions for all the slave equipment connected to it. The way it works is that the master clock generator feeds a number of separate, buffered outputs that in turn connect to the word clock inputs of the equipment being synchronised. Each piece of slave equipment is then switched to word clock sync mode so that it locks to the incoming master word clock rather than to its own internal oscillator.

Using word clock has a number of advantages, not least that it avoids long daisy‑chains of equipment. Long chains of this type can introduce clock instability as each piece of equipment is trying to lock to the one before it, and in extreme cases, this can result in either glitches or a total loss of sync. The other main advantage is that a high‑quality, low‑jitter master clock can be used to control everything, which should result in better audio performance.

If you don't have a separate master clock generator, use the word clock output of the device where your analogue‑to‑digital converters are situated, which will usually be your mixer. A separate clock splitter may be needed to provide you with sufficient clock feeds.

Q. Once a signal has been converted to digital, is it safe from further deterioration, always assuming the connecting cables are not causing any problems?

That's a tricky one to answer, because most digital systems don't just pass the numbers through unchanged, but instead process them in some way. For example, a digital mixer may change levels, add EQ, adjust the pan position and so on. Even an apparently simple thing like changing the level of a signal can introduce small errors, because to change a level, the stored audio data has to multiplied or divided by some factor. This invariably results in rounding‑up and rounding‑down errors where the mathematical sums don't work out tidily, though a well‑designed digital mixer will have an internal data path several bits wider than that of the original signal in order to provide some digital headroom for this type of calculation. Nevertheless, when the signal is finally returned to its original bit depth, some degree of distortion and noise may have been added due to the processing.

Q. At what stage during editing should a signal be normalised, if at all?

Whether you normalise or not depends on whether you plan to use a limiter that can also increase the signal level as part of the process. If you have one of these, for example the Waves L1, then you probably don't need to normalise at all. In any event, it's not a good idea to normalise prior to doing other processing because any type of subsequent processing can cause a gain increase, and if the signal is already normalised (meaning the peaks are already at maximum level), that can result in audible clipping. Even something apparently benign, such as EQ cut, can cause peak level increases of several dBs if you happen to cut a frequency that had previously been cancelling another frequency.

Q. What is noise‑shaped dither and when should I use it?

Dither is a process used when reducing the bit depth of an audio signal, for example, when changing from 24 to 16 bits. Because audio processing can produce signals with wordlengths greater than 16 bits (providing the digital headroom has been incorporated to do this), some bit‑depth reduction may have to take place when preparing an edited audio file for CD production.

The simple way to reduce bit depth is to throw away the least significant bits until the right number of bits remain (truncation), but this also discards some low‑level detail. If you listen to the end of a fade where the audio has been truncated, it becomes quite coarse‑sounding as the last few bits switch on and off, but a significant improvement can be made by adding dither noise sufficient in magnitude to keep the least significant bit constantly changing. This has the effect of allowing the signal to disappear smoothly into the dither noise rather than fizzling out as it would if truncated, but as dither is effectively noise, a small compromise in the signal‑to‑noise ratio is unavoidable.

Noise‑shaped dither gets around this by using mathematically calculated dither noise designed to occupy the 15‑to‑20kHz part of the audio spectrum where the human ear is relatively insensitive. The result is that more of the dynamic range of the original signal is preserved without increasing the subjective noise level.

It is important that dither is always added last, because any further processing will only defeat the object of the dither process. Furthermore, if you subsequently redither, you may end up in a situation where that 'invisible' noise in the 15 to 20kHz part of the spectrum builds up and starts to cause problems.

Q. Can I mix two digital signals using a Y cable?

No. There's a lot of maths involved in digital mixing and a simple Y cable just doesn't have the education!