You are here

Digital Interconnection Standards Explained

Exploration By Hugh Robjohns
Published December 1996

Every studio musician knows at least enough about analogue connections to get a standard recording system up and running. When it comes to digital, though, we're often in the dark about what will work with what. Hugh Robjohns raises the digital standard...

There can be few readers who do not already own at least one piece of equipment which records or processes audio in the digital domain, even if this is only a reverb unit or sampler. Cassette‑ and tape‑based recorders are being usurped by MiniDisc and hard disk recording systems, virtually all reverberation and signal processors are now DSP (Digital Signal Processor) based devices, and digital mixing desks are now becoming much more affordable, with the likes of the Yamaha ProMix 01 and 02R, the Korg 168RC (reviewed in this very issue), and the Soundtracs Virtua.

So it makes good sense to start connecting all this stuff together, to keep the audio signals in the digital domain, rather than continually converting between analogue and digital formats. In principle, this is not a problem (although it is certainly not as straightforward as analogue interconnection), but there are a number of issues which you should be aware of.

Analogue Interconnection

We're all used to connecting various bits of analogue equipment together, and I'm sure everyone has had to make up 'bodge leads' at some time, to interface various non‑standard bits of equipment.

Analogue audio can be balanced or unbalanced, can operate at nominal +4dBu or ‑10dBV signal levels, and uses a variety of connectors (XLRs, jacks, phonos). However, most interfacing problems can be solved by little more than wiring up a connector in a suitable way. Occasionally, we might have to use a transformer or a resistive pad to convert between balanced and unbalanced systems, or to match signal levels, but this is relatively rare. In other words, analogue interfacing is usually a matter of simple mechanics or basic electrics and is generally well understood.

Digital interfacing could not really be described as simple, and is not very well understood at all. In general terms, digital interface problems cannot be solved with a soldering iron and suitable connectors, but need carefully designed, sophisticated high‑speed electronics to convert between data formats and protocols. Such boxes are available, of course, but they are often fairly expensive, and it would make sense to avoid the need for them by buying equipment which shares a common interface format where possible.

Digital Interface Formats

A digital interface is rather more complex than an analogue one, because of the discrete nature of digital audio. Not only does the interface need to pass audio data on one or more audio channels between source and destination, but it also has to ensure that the data is decoded correctly when it gets there.

The sort of problems that emerge between different interface formats concern more than just what type of physical connector is used. Common problems include considerations like how many bits are described by each sample (16, 20, 24?); whether the digital data stream is encoded LSB (Least Significant Byte) or MSB (Most Significant Byte) first (does it send the left‑most or right‑most digit of the sample word first); where each sample starts and ends; and which part of the data stream is carrying the left audio channel and which the right.

In the early days of digital audio, each manufacturer designed their own interface, so that purchasers were locked into buying other equipment from the same manufacturer. There were basically three interface formats: Sony, Melco (Mitsubishi), and Yamaha, although only the Sony (SDIF) and Yamaha (Y1 or Y2) formats are still common today. As you might expect, these are all incompatible with each other — the SDIF‑2 system is unbalanced over three BNC connectors (left, right and word clock), sending up to 20 bits, MSB first. The Yamaha format uses an 8‑pin DIN socket, and transmits balanced serial audio data (left‑right‑left‑right) with up to 24‑bit resolution, LSB first. Its balanced word clock is arranged to identify both the start and end of the audio samples, as well as indicating left and right channels (low for left and high for right).

Standards: We Have Lots

To introduce some form of standardisation, the Audio Engineering Society (in America) and the European Broadcasting Union designed a connection format which would suit the needs of everyone — at least as far as stereo audio transfers were concerned. They came up with the AES‑EBU format, which sends a single, balanced data stream over 3‑pin XLR connectors. The data stream is a combination of left and right audio with an embedded word clock, and the audio data segments carry additional information such as pre‑emphasis modes and sampling rates.

The AES‑EBU format was subsequently modified to form another standard for use in domestic applications like CD players and DAT machines — the S/PDIF (Sony/Philips Digital InterFace) format. This is an unbalanced version of AES‑EBU, using phono connectors and carrying slightly different status information. This same data format is also transmitted optically using Toshiba's Tos‑Link connectors on many domestic CD players.

In the world of digital multitracks, there is still no standard interface format. Sony designed a version of SDIF for their early 24‑track DASmachines; this uses D‑sub connectors to carry 24 channels of audio as individual, balanced, signals (broadly conforming to the RS422 standard). The latest 48‑track DASmachines from Sony and Studer are normally connected using a version of the AES‑EBU system called MADI, which transmits 56 audio channels down a single co‑axial or optical connection.

Coming closer to earth, the two popular digital multitrack formats, ADAT (from the Alesis ADAT digital 8‑track tape recorder) and DTRS (as found on Tascam's competing DA88 8‑track) also have incompatible connection formats. The ADAT uses a pair of Tos‑Link optical leads to carry record and replay feeds down separate cables. Eight channels of audio data are multiplexed over each lead. The DTRS format uses a D‑sub connector to transfer data, employing an 8‑channel format called T‑DIF.

Practical Interfacing

You should now be able to see the enormity of the task of connecting various items of digital equipment together. Typically, digital multitracks will either use ADAT or T‑DIF formats (top‑end systems might use SDIF, PD — Mitsubishi's Pro‑digi format — or MADI). But many of Yamaha's systems still use only a proprietary system. Stereo equipment is generally interfaced using the co‑axial or optical versions of S/PDIF (for semi‑pro and domestic equipment) or AES‑EBU (for professional devices). Yamaha equipment can often make use of the Y2 format for interconnection between mixer and effects units.

Fortunately, most digital mixers are capable of providing some level of format conversion, even if this is through optional modules or add‑on units. For example, the Soundtracs Virtua has, as standard, facilities for stereo AES‑EBU in and out (for mastering) and 16 channels of ADAT optical interfaces. Optional units are available to convert the ADAT format to either T‑DIF or AES‑EBU. Similar facilities are available for Yamaha's O2R desk and all of the high‑end digital consoles. If greater format flexibility is needed, various manufacturers offer dedicated format converters — Otari, for example, have a very flexible 24‑channel unit which converts between SDIF, T‑DIF, ADAT, PD and AES‑EBU, and Spectral's Translator Plus offers ADAT, T‑DIF, Y2 and AES‑EBU.

Hard disk editing systems vary in the digital connection format offered. The Studio Audio & Video SADiE system, for example, uses AES‑EBU and S/PDIF, Sonic Solutions uses S/PDIF over Tos‑Link, and many Digidesign Pro Tools systems are equipped with full ADAT multi‑channel interfaces. The only advice I can give is to choose your items of equipment carefully to ensure they share compatible interface standards. Or, failing that, invest in a format converter, or the additional interface modules for your mixer!

Problems

So, assuming we've found suitable interface formats to link all the equipment together digitally, what else should be considered?

Without a doubt, the most critical thing is timing. Everything must be clocked at precisely the same time, so that audio samples are exchanged when the equipment is ready for it. Failure to do this will result in occasional clicks and 'splats' in the audio. Most of the current interface formats actually have a word clock embedded with the audio data, but it often pays to use a separate BNC lead to carry the word clock between the source and destination equipment to make sure.

This introduces another problem: which machine should clock which? Professional digital studios normally have a single central word clock generator, and its clock signals are distributed to all the other equipment (mixer, effects units, CD and DAT replay devices, ADCs (Analogue‑to‑Digital converters), sample rate converters, and so on) so that everything is running at precisely the same rate. In a smaller installation, the mixer is often the best clock reference, and the other equipment should be synchronised to it by distributing its word‑clock signal. Although clocks can be daisy‑chained between equipment, it is much better to use a word clock distribution amplifier.

Digital recorders connected to the desk do not usually require separate clocks, since they automatically lock to their inputs during recording. However, be warned that many hard disk, MiniDisc and DAT machines use their internal clock systems during playback and cannot be locked to an external reference — check before you buy.

Timecode

If you're using timecode in your system, it's absolutely essential that it is locked to the sample rate in the same way that it is locked to a frame rate. The rules for digital audio state that there must be a whole number of samples in a video frame, and therefore in a timecode frame. If you're using timecode generated from a digital mixer (for automation purposes) or from a DAT machine, timecode and sample rate will be automatically locked, but if your timecode comes from elsewhere, make sure it's locked to the same reference as the rest of the digital system. Often this means using a video reference (off‑air BBC 1 is about as stable as it gets), to lock the timecode generator and master word‑clock generator together.

Sample Rates

There are two basic options: 44.1 and 48kHz. If you are producing material for commercial release, it should be at 44.1kHz, whereas sound for film or video should be at 48kHz. Many professionals prefer to run hard‑disk editing systems at 44.1kHz because of the extra 8% storage time over 48kHz, even if this means having to sample rate‑convert for video lay‑backs. If you do not have a sample rate converter, do not worry — I have yet to find anyone who can spot a single analogue transfer using the decent modern A/D and D/A converters of most DAT machines or hard‑disk systems. At the end of the day, you have to take a pragmatic view of the world, and analogue audio is the universal sample‑rate converter!

Non‑Standard Standards

One of the most frustrating aspects of digital audio is in connecting equipment together using non‑standard AES‑EBU. It is surprisingly common to find domestic S/PDIF data appearing on professional‑looking AES‑EBU sockets on CD players and DAT machines. This is because S/PDIF output chips are plentiful and cheap, whereas AES‑EBU devices are not! Another related situation is that because the basic data structure of the S/PDIF and AES‑EBU formats is identical, as is the receive sensitivity, it is possible to wire up a lead to convert between the S/PDIF phono plug and the AES‑EBU XLR connector, with successful results. In either of these cases, connecting between S/PDIF and AES‑EBU format equipment will usually be successful in terms of audio transfer, but the extra hidden data can play annoying tricks.

Professionals have no need for copyright flags, because they all pay for the privilege of using protected material, but there are at least two different pre‑emphasis standards in use which need to be identified. On the other hand, there is only one domestic pre‑emphasis system, and copyright material must be identified to prevent illegal copying. All this information is transferred using the same chunk of status data within the audio data stream, and so is open to misinterpretation. The most common problem is for a professional unit to be indicating no pre‑emphasis in use, while a domestic recorder interprets the data as copy‑prohibited, and consequently refuses to go into record.

The more frustrating problem for many is SCMS (Serial Copy Management System), which was designed to reduce music piracy. All domestic digital recording equipment has SCMS chips, which will allow one digital copy to be made from a copyright source, but prohibit further copies from the second‑generation tape. The problem is that many systems erroneously impose a copyrighted status on original recordings, which is a continual source of frustration for home musicians trying to overdub or make dub‑edits.

It's often possible to configure equipment such as mixers and hard disk editors to modify their output status data to overcome these kinds of problems, or alternatively there are specialised units on the market which can analyse and change the data as it passes through. Some 'mastering processors' also have this kind of facility, which often helps to justify their purchase! It's worth studying your existing equipment handbooks to see if these facilities are available, or ask the questions when you are looking at new equipment.

Hugh Robjohns is a lecturer at the Centre for Broadcast Skills Training at BBC Wood Norton. The views expressed in this article are the author's own and are not necessarily those of his employer.

Digital Downer 1: Jitter

The worst thing you can do when plugging up a digital system is use an unstable source as the timing reference for the entire system (see 'Problems' section on next page). It's easy to do — you might have a domestic CD player you want to connect, but it has no external reference clock input. So you might connect this to a mixer input and configure the mixer to lock to the CD player's output, with all the other equipment being clocked from the desk. In this situation, any variation in the CD player's clock timing (called Jitter — the digital version of wow and flutter, if you like), will be passed through the complete system, corrupting the master recording.

Jitter can be caused in other ways, too, the most common being poor quality or excessively long cables. All cables have capacitance between the signal conductors and the earthed screen, which will tend to absorb high‑frequency energy from the signals being passed through the cable. Cheap cables tend to have high capacitance (ideally, you need something with 40pF/m or less for digits) which will tend to make the effect worse. Since digital signals are basically rapidly‑changing square waves with harmonics which run up to several tens of megahertz, cable capacitance will soak up these high frequencies, so that the nice square signal entering the cable falls out the other end as rounded or triangular shapes. This sloping of data edges can cause confusion in the decoding circuitry, which results in the data being reconstructed at the wrong time. The audible effects of Jitter include an increase in high‑frequency noise (often mistakenly identified as extra brightness) and unstable or muddy stereo images.

S/PDIF signals struggle to get through more than a couple of metres of even the best cable, and the standard give‑away Tos‑Link optical leads are no better. AES‑EBU can survive a couple of hundred meters down decent low‑capacitance cable, but every connector causes some internal reflections, so avoid linking lots of short lengths together. Multitrack connections often run at much higher rates than simple stereo connections, so the problems of cables tends to be even worse. The best advice is to keep all digital connections as short and direct as possible, and always use decent cables.

Digital Downer 2: Clock Howl‑Round

Another problem which can catch out the unwary is clock howl‑round. If you configure the digital mixer to lock to a DAT machine's output, and then plug the mixer's digital outputs to the DAT machine and start recording, you have a clock loop. The mixer is locked to the DAT, but while recording, the DAT is also locked to the mixer — they start to chase each other's tails! There are a number of possible symptoms, the most obvious being that the system just stops (many mixers will just shut down until the loop is removed); you might also encounter very loud howl‑round noise! In either case, it's pretty clear that your last connection was not a wise one and the problem can be solved.

Unfortunately some systems behave in a much more underhanded way, by gradually speeding up or slowing down until clock frequency limit is reached in one piece of equipment. This can be hard to spot, because unless you're replaying familiar pre‑recorded material, you might not recognise the gradual change of speed. However, when you replay the master DAT, it will be all too obvious. Keep an eye on the sample‑rate indicators of each machine to make sure everything is running at the rate you think it should be.

Digital Glossary

  • BALANCED/UNBALANCED: When any signal is represented as a varying voltage, there must be some reference point from which measurements can be taken. In unbalanced systems, the reference is ground (or earth) at Zero volts, and it is passed between machines on the outer screen of the interconnecting cable. Unbalanced operation is cheap but is prone to interference if the cable runs are long or the signal is small, and hum loops, where the reference signal can pass between equipment via multiple paths. Balanced signals do not require a separate reference point — instead an equal but opposite copy of the wanted signal is used as the reference. This system has the advantage that interference tends to be rejected more efficiently, and hum loops cannot be created, but requires more complicated cabling and interface circuits. Most professional analogue equipment uses balanced connections, and the same is now becoming true for digital interfaces too (AES‑EBU is balanced, S/PDIF is unbalanced).
  • BIT RESOLUTION: The number of bits (Binary digITS) used to describe an audio sample determines its quality. The more bits, the less the quantisation noise and the better the end result, but the greater the demand on storage space and processing power. CD and DAT use 16 bits, but there is increasing pressure to master at 20 bits or more. The holy grail of digital audio mastering is a 24‑bit system which would be able to capture the dynamic range of real life (144dB) without the need to set levels first!
  • DAS(DIGITAL AUDIO STATIONARY HEAD): This is the default standard for large open‑reel digital multitrack recorders, originally developed by Sony, but now also produced by Studer and Tascam. The format covers everything from 2‑track machines up to 48‑track, with the standard sampling rates and resolutions up to 24 bits on the latest machines.
  • MSB/LSB: The most and least significant bits. The least significant bit is worth a count of 1, whereas the most significant bit is worth a count of 32768 (in a 16‑bit system). Some early systems transferred data MSB first, but most modern interfaces are LSB first, which makes the number‑crunching easier and faster.
  • PRE‑EMPHASIS: In the early days of digital technology, A/D and D/A converters were relatively noisy. One way to overcome this problem is to boost high frequencies before A/D conversion (pre‑emphasis) and then reduce them by the same amount after D/A conversion (de‑emphasis). In this way, high‑frequency noise from the converters will appear to be reduced too — a broadly similar technique to that of Dolby B noise reduction. De‑emphasis facilities are incorporated into all CD and DAT machines, but very few modern recordings or DAT machines make use of them these days, because converter technology has improved dramatically.
  • SAMPLE RATE CONVERSION: The process of mathematically converting between digital data streams at differing sample rates — for example, from a DAT recording at 48kHz to a CD production master at 44.1kHz. Sample rate conversion is theoretically a perfect and lossless process, but requires some serious digital signal‑processing to perform the task in real time!
  • TIMECODE: SMPTE/EBU timecode is an audio signal of modulated 1kHz and 2kHz square waves. The modulation process encodes timing information related to picture frames (standard rates are 24, 25, 30 and 29.97 frames per second). Timecode is also available within the MIDI system, and as a sequence of black and white dots at the top of a picture frame in many professional video systems (vertical interval timecode, or VITC). There must be a fixed relationship between picture frames, timecode and the digital audio sampling rate.
  • WORD CLOCK: Often abbreviated to WCLK. The word clock is normally a square wave running at the sampling frequency. It provides the means of breaking a continuous binary sequence down into the correct 16‑bit sample chunks, and also of identifying which samples describe the left channel and which describe the right in a stereo system. An analogy might be to equate its function to thespaces between words. Without spaces, itisverydifficulttomakesenseofthecompletesentence!