By Anthony P. Kuzub, Ward-Beck Systems
The audio and video industry is experiencing an IP renaissance. With the deployment of AES67 networked audio solutions, certain terms are in need of revisiting. Though these terms are commonly used in analog and digital transmission, with AES67 and SMPTE ST2110-30 the mechanisms of moving audio are completely re-invented. The terms must follow. These obsolete audio terms are discussed with an update offered for its IP equivalent.
A single microphone on stage in a performance has to feed the talent, the audience, the recorders, and the broadcaster. We’ve been relying on iron core isolation transformers with miles of fine copper to perform this splitting of delicate signals, for decades. Lately, these same electrical primitives in use between the front end of multiple digital mixing consoles. Different sample rates, different clocks, and vendor proprietary ecosystems force this. The redundancy of analog to digital conversion with active and passive splits is no longer needed on a converged network. With AES67 being multicasted onto the network an engineer subscribes to the sources needed to perform the task of signal splitting. The network switch is the most powerful audio splitter ever made and costs less than the road-case for the splitter one can pay thousands for. It’s time to stop splitting and start multicasting.
We have been weaving, making and breaking audio signals patches since we’ve had a need for them. As the quality of microphones went down, the quantity of them went up. As we made studios and facilities larger, the need for flexibility went up. Multiple sources from different rooms needed to be strung through buildings and hard-wired for a limited and specific purpose. As AES67 is using a layer 3 networking protocol, the need for single application rigid physical infrastructure disappears. As all of these signals are converge to a network, we’re not patching them from source to destination, the destination is joining an IGMP stream. As the sources populate the network, the receivers can have access to that signal. We’re not using patch cables anymore to connect sources and destinations. We’re trading a Session Description Protocol (SDP) and we’re using software to request a stream delivered through IGMP. We’re not patching, we’re joining IGMP streams!
Traditionally we push audio to a distant load through a transmission line. Signals on a network do not need to be distributed as they were in the past. With AES67 and SMPTE ST2110-30, you are subscribing to the stream and the networks flow control, Internet Group Management Protocol (IGMP), handles the join connections for you. The classic audio distribution amplifier is over 50 years old, 1 input 8 outputs. 1 transmitter to 8 fixed receivers. With AES67/SMPTE ST2110-30 you have 1 transmitter and (n) receivers. The need to distribute signals is no longer required as they are available anywhere and everywhere at the same time. AES67 signals are Discovered using a mechanism like Ravenna (MDNS-Bonjour), Session Announcement Protocol (SAP-Dante Controller) and Session Initiation Protocol (SIP). These discovery mechanisms are a byproduct of telco-industry.
When the front of house mixing console, a hundred plus feet away from the stage, is receiving microphone signals it is very susceptible to noise. If the ground at the console has a different potential for grounding than that of the stage you’re in trouble. Ditto TV studio, Radio studio, Recording studio, Mobile truck etc. With AES67 and networking technologies there is no need for a common system ground. There is no benefit to a common system ground because your sound that’s digitally encoded is now packetized data. The sound is not susceptible to external environmental interference within the transmission path. You cannot bond fiber optic glass to earth and optical pulses are not susceptible to electromagnetic interference. And, ethernet cables are transformer isolated from the electrical system they are operated on. Power over ethernet derives its own reference relative to the switch that generates it. Everything is floating relative to everything else!
Word clock is an imperfect one-way communication system for synchronization. Because of its unidirectional transmission, a receiver of word clock cannot know how far off it is from the generator. Word clock is a one-way conversation with no start, no finish and therefore no linear reference. With AES67 and ST2110-30, audio is synchronized using a precision time protocol (PTP) designed by the IEEE(1588). PTP is a conversation between a master clock server and client edge device. Algorithms are used for elections of the best master clock; this keeps the network in sync and always with a clock. A PTP master clock can be generated by an oscillator locally or slaved. There is no clock source on earth quite like the one not on the earth. Using GPS synchronization, (n) number of stations/remotes or devices can be perfectly synchronized and sync locked continents apart. SMPTE 2059 and PTPV2.1 are the new timepieces for your audio and video networks. What good is word clock when you can be world clocked?
There is no naturally occurring stereo sound. If that’s true then there is no naturally occurring surround sound. All naturally occurring sound is monophonic, the rest of what we hear must be reflections and influences of the acoustic environment. Mono sounds must be positioned somewhere in a playback environment to give position, depth, width and now (finally) height information relative to the listener's position. With a traditional five point one audio system we are bussing signals to emulate the above.
With AES70 we can now encode a set of Open Control Architecture parameters to deliver positional information with what could become sample accuracy in a stream of data. Pan, tilt, dolly, roll, truck, pedestal and geo-positional information of a microphone can be captured in the AES70 class architecture. Like an access database, you need a Key, its Length, and the Value. These parameters can then rendered by a multichannel speaker system or encoded relative to Head-related transfer functions in personal headphones. Why be surrounded by sound when you can be immersed in it?
DeMux - Mux
Somethings Done Iterating. The serialized transmission of multiplexed video is coming to an end. With a converged network we are seeing the things once bundled together be split apart and scattered across the network as unique streams. With SMPTE ST2110 the audio, data, and video are unique multicasts. The best solutions to manages these devices is the NMOS (Networked Media Open Specifications). Its model refers to these elements in a hierarchy of Sources, Flows, and Grains. The audio, data, and video are grains that can be combined to create a flow that becomes a source of the device. The flows from sources are then received and split back into individual grains. Goodbye multiplexing, hello Source. Goodbye de-muxing, hello flow-grains.
Inputs are dynamic. As systems evolve and changes are made on the fly it is very hard to have an input list 100%. Being up to date takes paper and runners; unless, you’re using the same exact spreadsheet. The use of ‘living documents’ such as Google Sheets or Smartsheet allow all operators to have immediate access to the current information, everyone will now be on the same sheet. With AES67, your patching can be automated. With AES67 you are never changing the parameters of the transmitter you are only changing what is being received where. If all of your multicast sources were in a database, with metadata, it could be used to dynamically update your console’s inputs. When you realized that most modern digital consoles are just database driven analog console emulators, you wonder why we’re still passing around paper input lists with numbers instead of a Database of raw accurate data.
Sound recording is an electrical, mechanical, electronic, or digital inscription of sound waves. In AES67 you have none of the above. Data packets that contain transmissions are moved over a network. Within those packets is a representation of the digital inscription but the packet itself is not the inscription, therefore not a recording. Wireshark could capture the ‘packet storm’ that is on a network. If all of those packets were intercepted captured to a reliable and readable capture format one would accomplish the “recording of AES67”. This process would capture the audio on the network without any introduced signal processing, conversion or dependence on synchronization. The packets captured are individually time-stamped and could be re-assembled for playback. We’re not recording anymore - we’re capturing.
Anthony P. Kuzub is the IP Product Manager for Ward-Beck Systems and a vice chairman of the Toronto chapter of the Audio Engineering Society. He also hosts the informational site networking.audio if you want more info on standards and technologies. He can be reached at firstname.lastname@example.org.