Archive for the ‘Studio and Cinema’ category

The Emperor’s New Stereo

March 9, 2011

I was contacted a few months back by Jose Luis Diaz about an article I wrote for Mix magazine -in 1998. He asked did I have a copy of the original in pdf form.  No. I am not the best archivist. 😦

Well it turns out he had a Spanish translation of it and he RETRANSLATED it back to me. 🙂

It’s funny for me to see the old article and the extremely crude drawing quality of that era. As for the subject matter itself, it still holds up pretty well. Not too long ago went to another concert with a 5 piece jazz band where the piano was on the left and the guitar on the right. We had really great seats on the left side. The piano and drums and bass were fresh and clear. The guitar I heard when it came back off the wall on the right side. Bet it sounded great at FOH.

So here it is ……once again.  And if you want the Spanish version go here


The Emperor’s New Mix

Unveiling the stereo myth on live sound

(Bob McCarthy Mix Magazine January 1998)

Once upon a time, there was an emperor living in a giant palace.

After mixing some tracks in his private studio, the emperor was so happy with the stereo image that he decided to throw a concert for his 5000 closest friends.

For the occasion, he bought a new luxuriously advanced stereo sound system.

Before the show started, the emperor told the audience what the sound system sales man had said to him:

”This system has such magic qualities, that it’s capable of creating perfect stereo imaging in every seat. Every person that doesn’t experiences stereo imaging is, obviously, vulgar and not suitable for his job.”

Everyone was sitting to the left and to the right all along the center walkway.

The sound system was set in such a way, that all the seats where inside the left and right P.A. towers coverage area.

The concert began.

The emperor was sitting in the center of the room, and he marveled at his own sophistication. The stereo image was perfect!

Everyone else shuffled in their seats realizing how vulgar they were and the danger they faced of losing their jobs if they were caught. To them, the sound appeared to come almost exclusively from the nearest P.A tower from their location.

When the concert finished, all the guests congratulated the emperor over the vivid stereo image they had experienced. Everything seemed to go well until a little boy, putting words to everyone’s thoughts, said:

”Why did all the music except the tom drum come from the right speaker?”

What the boy had said was true, and everyone knew it.

For some reason, the stereo image only worked in the very center of the room. How could this be? Was there something wrong with the sound system? With the mix? With the room acoustics? None of the above.


There is one simple and irrefutable problem: stereo effects don’t scale when moved from a studio to a bigger room. You could have all the stereo coverage needed for every seat, but that doesn’t mean you’ll experience stereo imaging when you leave the center.

Everyone agrees that stereo spatialization is better perceived from the center. But in a studio, or in a living room, one can move freely over a large part of the room and still experience reasonably effective stereo.

Try it yourself: Play a well mixed track in your living room, sit directly in front of the left speaker and close your eyes. Although off-centre, it’s still possible to identify the instruments all along different horizontal locations in between speakers. Now try it again in front of the P.A tower of the left, from a 30 meters distance in a concert hall. No more gradual horizontal movement between both sides. The image stays almost exclusively in the left speaker.

Keep your eyes closed, and slowly head to the center of the room (be careful!) until you reach a point where you find the same panoramic image you experienced in your living room. Be objective! This is all about real experience, not expected results. Surely, you will be standing just a few steps away from the center of the room, not much further than in your living room.

The distance you can travel in your living room while retaining acceptable stereo imaging is almost the same as you can travel in a 5000 seat concert hall before you lose spatialization.


Panoramic location between two sound sources depends on two interrelated factors: Time differences and Intensity differences. Let’s analyze intensity differences first.Turn gradually the pan pot in your console to the right. You have created now a difference in the level between the channels, favoring the right one, thus, the stereo image (as it’s expected) moves to the right.

This happens, as long as you remain seated in the center of both speakers. If, by any chance, you’re sitting to either side, the image won’t move the same way the pan pot does. Why? Here comes the defining factor in sound localization: time difference.

We locate the image depending on which source arrives first to our ears, even if the time difference is minimal and the later source has more intensity. The psychoacoustic relation between these two factors is known as ”Precedence effect” and was analyzed in 1950, among others, by the now famous Dr. Helmut Haas.

The ”sweet spot” for binaural localization (stereo imaging) is within the first millisecond of time difference. If the time difference exceeds the 5 milliseconds, the sound image can only be moved by brute force. The channel that arrives last must be 10 dB louder than the first to achieve this.

Now this is where the scale concept really comes alive.

Time and intensity differences don’t translate equally when we scale from a small space to a large one.

The intensity difference is a proportion between the level of both sources (the two speakers, the two channels…). The intensity relationship between left and right channel is the same in your living room than in a stadium. If you’re standing at twice the distance from one speaker in reference to the other, the intensity difference will be 6 dB, This will remain the same, no matter if the difference is 1.5 and 3 meters, or 15 and 30 meters.

The time difference, however, is not a proportion. It is simply, the DIFFERENCE in the arrival time of both sources.

While the intensity difference was kept constant in the previous example, the time difference will be multiplied by 10 when we increase the distance from 1.5 (4.4 ms approx.) to 15 meters (44 ms).

Given that the time difference is the predominating factor in sound location, you can clearly see that the odds are low when you’re trying to achieve stereo in large scale.

Because we only have a 5 ms window to control the image, the usable space to recreate stereo in a stadium is, in proportion, really small compared to your living room. In other words, the horizontal area needed to experience true stereo localization (the space where the images can be situated) is barely larger in a stadium than it is in your living room.

Nobody wants to admit that there is no stereo for the big crowds. From a mix engineer point of view, stereo represents an advantage. If he is mixing from the center of the room, it’s easier to listen individually to each instrument in the mix if they are panned all along the horizon. Plus, it’s more fun this way.

The diagram shows a concert room and a living room. The living room is in scale to the concert room. The light-shaded area in the living room drawing shows the area where the time difference between sources is less than 5 ms. This is the area where true stereo is achieved.

The same shading in the concert room is where one would assume you could obtain stereo imaging. The dark-shaded area shows the real area where stereo works properly in a concert room.


The search for a stereo image can have a negative effect in the frequency response uniformity if the speakers are arranged in a way where there is too many overlapping of coverage area.

Signals panned to the center, almost always the important channels, will arrive at different times to the seats far from the center. This causes severe comb filtering and changes the frequency response for each listener.

Comb filtering, or combing, is one of the side effects caused by combining signals that aren’t in sync. The time differences change the phase relation between both speakers for all the frequencies. In any location, the frequency response obtained will depend on the phase relation between both signals. When the phase matches, there will be a total sum. When the phase is inverted, there will be a total cancelation.

In any point in between those two, the combined signal won’t have sums or cancelations. Instead, it will have a series of audible peaks and depressions in the obtained response. Each change in location will hold different time differences between left and right channel, and because of this, a new phase relation, resulting in a new series of peaks and depressions in the frequency response.

The irregularities caused by combing are more severe when you have two signals with the same intensity but different time arrival.

The more you try to spread the stereo, increasing the overlapping area of the speakers, the more audible will the peaks and dips will be. This is not to be taken lightly. A sound system with a large overlapping area will have variations of up to 30 dB in the frequency response over a band width that changes from seat to seat, turning EQ into something completely arbitrary. A short 1 ms delay will create a 1-octave hole in 500 Hz and will scale that way. Longer delays degrade the intelligibility and the sound quality even further.

If the stereo image is the most important, then you should fully pan the channels and make the overlapping coverage area of the speakers fill the room. The only way to beat time difference is forcing it with intensity. Although this expands the stereophonic area, you will be left with terrible level differences between channels at both sides of the room. However, channels panned to the center will have a variable response over the listening area, caused by the combing obtained with all the overlapping.

This technique was used for many years by a nameless touring band, which hard-panned several of its musicians. In the center of the listening area, the stereo was fantastic.

However, fans that couldn’t arrive early to the shows, in order to get seats in the center, would have to choose between listening to the left drummer and the guitar player, or, the right drummer and the keyboard player.

If the priority is to make the entire band enjoyable for the whole audience (and I expect it to be this way), then, leave the stereo as a special effect. Design the sound system in a way that the overlapping of the left and right speakers roughly matches the 5 ms time delay window area. Reduce the level of infill speakers so the front and center coverage can be achieved without big overlapping spaces. Don’t waste your time, energy and money on stereo delays and fills.


All of these can sound radical, maybe even heretical to many readers. After all, we have put too much time and effort into stereo reproduction in P.A systems.

It would be awesome if we could achieve stereo in every seat of the room, or even half of them. If a large amount of the audience receives the benefits of stereo imaging, we could argue that combing and intelligibility loss are a reasonable price to pay for it. But it is futile and self-destructive to fight against the laws of physics and psychoacoustics and to pretend that we are experiencing stereo, when we are not. Remember our priorities.

It is unlikely that our customers will raise their voice because they don’t have enough stereo. They certainly will of course, if everything sounds like a telephone or can’t be understood, two of the most common results when searching for stereo on big shows.

Mono sound reinforcements seem like something we should have already discarded for something better, but they have a big advantage over stereo: They work.

This is not a statement that will please the emperor, or the band manager, but it does hold some truth: ”This system has such magic qualities that it’s capable of creating perfect mono imaging in every seat”.

So thank you Jose Luis.

Training at Disney Animation II: Size Matters

February 28, 2010

In the world of animation the world can be as big a basketball and a basketball can be as big as the world. In the real world, size matters. Not in the animation world. In the world of animation audio, size does not seem to exist either. Low frequencies don’t have a wavelength in ProTools. Sure they look all stretched out compared to the HF samples but there are no worries that they are too big to fit in the hard drive. This is not just an issue for this particular studio, but rather studio world in general. Such spaces are controlled environments. Acoustically sterile. Tracks can be synthesized, spliced, re-sampled to another frequency range, mixed with another track from across the world with little regard for the acoustics of the space. The physical nature of acoustics is far removed from the thoughts of people in these quiet spaces. Those of us in live sound never have the luxury of NOT thinking about the affects of the local acoustics, and the loudspeakers that fill them. Live sound people live in an acoustic combat zone – battling the interaction of multiple speakers, open microphones, stage noise, reflections and more.  Studio sound folks can isolate and control sounds to create a clean sound or, if desired, purposefully create an acoustic combat zone for the listeners.

Things we need to know to do our job, and things we accept and move on

Digital audio is pretty much magic to me. Sure it is 1’s and 0’s, a clock frequency and sample rates and all but I don’t visualize shift registers and memory locations as a filter. I turn a knob on my digital parametric and I look at the result on the analyzer. Its amplitude and phase responses are indistinguishable from the analog filters that I understand down to the resistors, capacitors and op-amp feedback loops. The end result is that I accept the digital filter as functionally equivalent and move on to do my job – set the filters to the best shape for the application. Each of us has unique areas that we accept and move on from in order to specialize in our area of interest. Don’t ask me to operate your mix console artistically, but I will be happy to show you lots of interesting things, scientifically,  about what happens to the signal in its custody.

In the world of studio mixing, the focus is on acquisition and manipulation of the “media”:  a recorded bit stream of audio. Once captured the media is free of the constraints of physical size until its final delivery to the audience. The only place where size matters is in the differences between the mix room and the showroom. Here is where an interesting set of proportions play out: a tiny minority of listeners (the mixers) will do the vast majority of the listening  in the small space of the studio, The vast majority of the listeners (the audience) will do a tiny minority of the listening (perhaps even just once – rather than 100’s of times like the mixer)  in the showroom. So the link between the rooms is critical. We have a very short window for people to experience the details the mixers worked so hard to create in their controlled studio environment.

My job here was to train the engineers in how to operate the analyzer. The analyzer measures differences – this is pretty much its entire capability: differences in level, time, distortion, phase, added noise. This is exactly the tool needed to link the studio and the showroom. But the tool is useless without the knowledge of how the differences are shown, and the vital link as to how those differences are perceived. Night and day differences can be shown on the screen – that have no perceptible sonic difference. Conversely we can enact audible differences that will be invisible to our analyzer. It is going to be important to know the difference.

It was not surprising to me that media/engineers there have spent little time considering the acoustical physics at play. It is not their job. The acoustics of the rooms and the speakers are provided by others. Unless there are gross deficiencies in the mixing room setup they can move ahead with their work. Each individual knows which rooms they like – but the central criterion for these engineers is how well the mixing rooms predict the listening experience in the showroom. It is possible with extensive ear training to be extremely competent at hearing the translation, and memory mapping the difference in order to anticipate the effects. It is highly improbable, in my opinion, that one can figure out how to affect these two different spaces in order to make them the most similar, without a thorough understanding of the physical nature: the size, shape, speed of sound, and the mechanisms in our human anatomy related to our sonic perception. A mix engineer predicts the translation, a system and/or an acoustical engineer affects the translation. The system engineer’s role is to help the mix engineer’s predictions come true.

The relative analyzer: Human and machine

Disney animation’s purchase of the SIM dual-channel FFT analyzer creates an opportunity to open a window into the physical nature of sound. The analyzer’s renderings are purely objective, and only displays what physically exists, i.e. – no prediction, no simulation. This does not mean it displays things exactly as we hear them. It measures relationships – some of which we experience directly, some indirectly. For example, let’s listen to a recorded violin track. ProTools can show us the sampled waveform over time of the track – amplitude vs. time. The analyzer can show us the spectral content over frequency, the relative levels over frequency over a given period of time. ProTools can (or at least could someday) show you this as well. That is still the easy part because we still have a one-dimensional rendering of the signal – level/freq. This can also be done with a Real-Time-Analyzer – the king of one-dimensional audio analysis.

Where the analyzer breaks out on its own is in the relative response: the response of the violin at my ear – compared to its own response in its recorded electronic form.  The analyzer can see peaks and dips, how much time it took to travel, how different the arrival time is over frequency (phase over frequency), how much HF was lost in the air, how much energy the room reflections added to the low end, how much noise was added to the signal and more. These modifications to the waveform all come from physical realities, and therefore, the best solutions come with an understanding of the physical nature.

The analyzer sees the difference between an electrical signal an acoustical. Humans can’t do that  unless they have an XLR input in addition to their ears. We hear only the end product, complete with its acoustical effects. We are, however, doing our own version of relative analysis. When we hear the violin track we are comparing it in our brain to our memory file of violin sounds. This history file gives us a variety of reference points to consider: how close we are to the violin, how big the room is, how live the room is, whether the violin being mic’d or a pickup, the presence of nearby surfaces that create a strong reflection to mar the tone, and much more. If the violin sounds close, we have the perspective of being on stage. If it has a long reverberation tail we are cued to believe that we are in a large reflective room. If the picture in front of us is a distant violin playing outdoors our brains will know we will have an implausible scene.

Two other relative analyzers for humans (and other animals) are used for localization.  Binaural localization compares the signals at the left and right ears for arrival time and level. Sound from the left side arrives earlier and louder at that ear and we localize the sound there. For vertical localization we use the comb filter reflection signature of our outer ear to determine up and down. The outer ear is asymmetric in shape and therefore a different set of reflections guides the sound into our ear from above than below. We compare the reflection structure riding on the incoming signal to the memory mapped signature set in our heads and match it up to find the vertical location.

The FFT analyzer operates differently: give it any two signals and they can be compared. Differences in time and level over frequency are directly shown. The analyzer does not need to know what a violin sounds like to know if it is being accurately reproduced. Whatever sound is generated can be the reference for comparison.

The next level is the relative/relative. We can compare the response at one location to another – or at a given time to another – or both. We can look at the studio response in multiple locations, look at the studio compared to the theater etc.  Our human analyzer has this capability as well but this is a considerable challenge to get specific data. One can walk around a room and observe the differences, or you can play your reference track at the studio and then take it to the theater. While it is not so difficult to walk around and hear differences in overall level,  gross frequency response characteristics and spot a strong peak here and there, it is very difficult to pinpoint a cause such as 2ms difference in arrival between two speakers or a 3ms reflection off the console.  It is possible that a person walking around the room can find these things and more by ear alone and propose the best solutions. I could win the lottery too. The probability of success go up greatly if we add the analyzer results to the data set ( we don’t have to stop listening and walking) and supplement our ears with information that we cannot directly experience. Our ears never directly say: The time gap between the direct and reflected sound is 8 ms.  They hear it. The data is in there – but you can’t access it. With our analyzer this number pops right out and the resulting peaks at 125 Hz, 250 Hz , 375 Hz and up will be as clear as the Himalayas.  But in order to get these answers, we will need to know enough about acoustics behavior to have this info at our fingertips.

To be continued

SIM3 Training for Disney Animation Studios

February 22, 2010

This week I spent two days at the Disney Animation Studios in Burbank California. This is the place where major animation films are created. Unless you have been living under the Rock of Gibraltar you know what these films are.  Mermaids, Beauties, Beasts, Princesses and more come to life in this space in the form of hand-drawn and digitized artwork. It is a fascinating place, nestled amongst the neighborhood full of the studios of Warner Brothers, Universal and other major players. This studio is quite different from the traditional movie lots in that a great volume of material is generated from a small amount of real estate.

The animation audio product is similar – and different from its counterpart in the world of flesh and blood actors in front of cameras. The recording process for animation dialog can be much more carefully controlled, since there is no need for on-site microphones with all of their challenges with noise and synching. The final stages – the mixing of audio and its inevitable translation into the cinema and home environment are faced with the same challenges – whether they are animated or live-action originals. The cinema challenge is about standards of reproduction. The media leaves the studios and is reproduced is a new space – cinemas, homes and whatever else. The creative designers – audio and video – must have faith that their work is accurately represented out in its public form.

This is a very different world than our live audio perspective.  A live show has no requirement to adhere its reproduction to an ongoing standard. If the guy mixing  ZZ Top thinks that he wants some more 400 Hz, then who is to argue with him?  The 80 mic lines coming INTO the mixing console are not a finished product to be shipped to the listeners. The tom drum mic may have severe leakage issues from the snare. Reproducing it “accurately” could be an idiotic idea. The finished product in live sound is inherently – and continuously – a closed loop of self-calibration. The mix engineer constantly readjusts the system like the hydrodynamic stabilizers that continuously keep a cruise ship upright.

Where a standard is applicable in live sound is between the mix position and the audience – and that is where the worlds of live sound and cinema sound meet. In live sound, the self-calibrated mix location meets the audience at the same time, just beyond the security fence. In studio world, the self-calibrated mix position meets the audience in another room, at another time.  Creativity is the king in the mixing space, but objectivity is the only hope for accurately translating that creative undertaking to our listeners –whether it be live or Memorex.

Standards of reproduction

The cinema world has long adhered to standard practices in order for its media to be accurately represented. The audio standards were quite lax historically, but have come great strides in the last 30 years with standards, verification and testing brought to the industry through THX, Dolby and others. It is not my intent to provide a history of cinema sound here – suffice to say that, unlike live sound – we can measure a sound system in a room have a target response that we can see on an analyzer.  The reason is that –unlike live sound – the media presented to the speaker system IS the finished product and therefore can be objectively verified. A reproduction that is more accurate – closer to the standard is objectively superior to one that is less accurate. If there is a peak in the sound system 400 Hz for a live sound system, the mix engineer can – and will – modify the mix to reduce 400 Hz until the offending frequencies are tamed.  If a cinema playback system is left with such a peak, it will be left there for all to hear. If the speaker in the recording room is left with such a peak – the inverse will occur. There is no feedback loop between the recording studio playback monitors and the house speakers. This loop can only be closed by adherence to a standard in both locations.

A simple case in point is subwoofer level. Live engineers consider this a continuously variable creative option. In cinema world the sub level MUST be set objectively or major points of impact will be either over- or under- emphasized.

The SIM3 Analyzer

The folks at Disney Animation have added a SIM3 System to their inventory of acoustic analysis tools. This is an excellent tool for the job of calibration for studio and large spaces – and for verification that such spaces match. My purpose over these two days was to train a group of engineers there how to operate the analyzer and to open their perspectives up to seeing how measurable physical acoustics affects their work and its translation.  The addition of SIM3 opens up a lot of doors for cinema sound. The adherence to standards can stand to be greatly improved by the use of a complex, high resolution FFT analyzer such as SIM.

In the next part I will describe some of the interesting things that came up during our two days there.

Here is a photo of the Disney Animation studios from their web site. Interesting note is that the building was built by McCarthy Construction Company. This was my family’s (I am 5th generation) company and I expected to grow up and join the company. Instead it did neither – blame it on rock and roll. But either way, I guess I would have ended up here!