Archive for February 2010

Training at Disney Animation II: Size Matters

February 28, 2010

In the world of animation the world can be as big a basketball and a basketball can be as big as the world. In the real world, size matters. Not in the animation world. In the world of animation audio, size does not seem to exist either. Low frequencies don’t have a wavelength in ProTools. Sure they look all stretched out compared to the HF samples but there are no worries that they are too big to fit in the hard drive. This is not just an issue for this particular studio, but rather studio world in general. Such spaces are controlled environments. Acoustically sterile. Tracks can be synthesized, spliced, re-sampled to another frequency range, mixed with another track from across the world with little regard for the acoustics of the space. The physical nature of acoustics is far removed from the thoughts of people in these quiet spaces. Those of us in live sound never have the luxury of NOT thinking about the affects of the local acoustics, and the loudspeakers that fill them. Live sound people live in an acoustic combat zone – battling the interaction of multiple speakers, open microphones, stage noise, reflections and more.  Studio sound folks can isolate and control sounds to create a clean sound or, if desired, purposefully create an acoustic combat zone for the listeners.

Things we need to know to do our job, and things we accept and move on

Digital audio is pretty much magic to me. Sure it is 1’s and 0’s, a clock frequency and sample rates and all but I don’t visualize shift registers and memory locations as a filter. I turn a knob on my digital parametric and I look at the result on the analyzer. Its amplitude and phase responses are indistinguishable from the analog filters that I understand down to the resistors, capacitors and op-amp feedback loops. The end result is that I accept the digital filter as functionally equivalent and move on to do my job – set the filters to the best shape for the application. Each of us has unique areas that we accept and move on from in order to specialize in our area of interest. Don’t ask me to operate your mix console artistically, but I will be happy to show you lots of interesting things, scientifically,  about what happens to the signal in its custody.

In the world of studio mixing, the focus is on acquisition and manipulation of the “media”:  a recorded bit stream of audio. Once captured the media is free of the constraints of physical size until its final delivery to the audience. The only place where size matters is in the differences between the mix room and the showroom. Here is where an interesting set of proportions play out: a tiny minority of listeners (the mixers) will do the vast majority of the listening  in the small space of the studio, The vast majority of the listeners (the audience) will do a tiny minority of the listening (perhaps even just once – rather than 100’s of times like the mixer)  in the showroom. So the link between the rooms is critical. We have a very short window for people to experience the details the mixers worked so hard to create in their controlled studio environment.

My job here was to train the engineers in how to operate the analyzer. The analyzer measures differences – this is pretty much its entire capability: differences in level, time, distortion, phase, added noise. This is exactly the tool needed to link the studio and the showroom. But the tool is useless without the knowledge of how the differences are shown, and the vital link as to how those differences are perceived. Night and day differences can be shown on the screen – that have no perceptible sonic difference. Conversely we can enact audible differences that will be invisible to our analyzer. It is going to be important to know the difference.

It was not surprising to me that media/engineers there have spent little time considering the acoustical physics at play. It is not their job. The acoustics of the rooms and the speakers are provided by others. Unless there are gross deficiencies in the mixing room setup they can move ahead with their work. Each individual knows which rooms they like – but the central criterion for these engineers is how well the mixing rooms predict the listening experience in the showroom. It is possible with extensive ear training to be extremely competent at hearing the translation, and memory mapping the difference in order to anticipate the effects. It is highly improbable, in my opinion, that one can figure out how to affect these two different spaces in order to make them the most similar, without a thorough understanding of the physical nature: the size, shape, speed of sound, and the mechanisms in our human anatomy related to our sonic perception. A mix engineer predicts the translation, a system and/or an acoustical engineer affects the translation. The system engineer’s role is to help the mix engineer’s predictions come true.

The relative analyzer: Human and machine

Disney animation’s purchase of the SIM dual-channel FFT analyzer creates an opportunity to open a window into the physical nature of sound. The analyzer’s renderings are purely objective, and only displays what physically exists, i.e. – no prediction, no simulation. This does not mean it displays things exactly as we hear them. It measures relationships – some of which we experience directly, some indirectly. For example, let’s listen to a recorded violin track. ProTools can show us the sampled waveform over time of the track – amplitude vs. time. The analyzer can show us the spectral content over frequency, the relative levels over frequency over a given period of time. ProTools can (or at least could someday) show you this as well. That is still the easy part because we still have a one-dimensional rendering of the signal – level/freq. This can also be done with a Real-Time-Analyzer – the king of one-dimensional audio analysis.

Where the analyzer breaks out on its own is in the relative response: the response of the violin at my ear – compared to its own response in its recorded electronic form.  The analyzer can see peaks and dips, how much time it took to travel, how different the arrival time is over frequency (phase over frequency), how much HF was lost in the air, how much energy the room reflections added to the low end, how much noise was added to the signal and more. These modifications to the waveform all come from physical realities, and therefore, the best solutions come with an understanding of the physical nature.

The analyzer sees the difference between an electrical signal an acoustical. Humans can’t do that  unless they have an XLR input in addition to their ears. We hear only the end product, complete with its acoustical effects. We are, however, doing our own version of relative analysis. When we hear the violin track we are comparing it in our brain to our memory file of violin sounds. This history file gives us a variety of reference points to consider: how close we are to the violin, how big the room is, how live the room is, whether the violin being mic’d or a pickup, the presence of nearby surfaces that create a strong reflection to mar the tone, and much more. If the violin sounds close, we have the perspective of being on stage. If it has a long reverberation tail we are cued to believe that we are in a large reflective room. If the picture in front of us is a distant violin playing outdoors our brains will know we will have an implausible scene.

Two other relative analyzers for humans (and other animals) are used for localization.  Binaural localization compares the signals at the left and right ears for arrival time and level. Sound from the left side arrives earlier and louder at that ear and we localize the sound there. For vertical localization we use the comb filter reflection signature of our outer ear to determine up and down. The outer ear is asymmetric in shape and therefore a different set of reflections guides the sound into our ear from above than below. We compare the reflection structure riding on the incoming signal to the memory mapped signature set in our heads and match it up to find the vertical location.

The FFT analyzer operates differently: give it any two signals and they can be compared. Differences in time and level over frequency are directly shown. The analyzer does not need to know what a violin sounds like to know if it is being accurately reproduced. Whatever sound is generated can be the reference for comparison.

The next level is the relative/relative. We can compare the response at one location to another – or at a given time to another – or both. We can look at the studio response in multiple locations, look at the studio compared to the theater etc.  Our human analyzer has this capability as well but this is a considerable challenge to get specific data. One can walk around a room and observe the differences, or you can play your reference track at the studio and then take it to the theater. While it is not so difficult to walk around and hear differences in overall level,  gross frequency response characteristics and spot a strong peak here and there, it is very difficult to pinpoint a cause such as 2ms difference in arrival between two speakers or a 3ms reflection off the console.  It is possible that a person walking around the room can find these things and more by ear alone and propose the best solutions. I could win the lottery too. The probability of success go up greatly if we add the analyzer results to the data set ( we don’t have to stop listening and walking) and supplement our ears with information that we cannot directly experience. Our ears never directly say: The time gap between the direct and reflected sound is 8 ms.  They hear it. The data is in there – but you can’t access it. With our analyzer this number pops right out and the resulting peaks at 125 Hz, 250 Hz , 375 Hz and up will be as clear as the Himalayas.  But in order to get these answers, we will need to know enough about acoustics behavior to have this info at our fingertips.

To be continued

SIM3 Training for Disney Animation Studios

February 22, 2010

This week I spent two days at the Disney Animation Studios in Burbank California. This is the place where major animation films are created. Unless you have been living under the Rock of Gibraltar you know what these films are.  Mermaids, Beauties, Beasts, Princesses and more come to life in this space in the form of hand-drawn and digitized artwork. It is a fascinating place, nestled amongst the neighborhood full of the studios of Warner Brothers, Universal and other major players. This studio is quite different from the traditional movie lots in that a great volume of material is generated from a small amount of real estate.

The animation audio product is similar – and different from its counterpart in the world of flesh and blood actors in front of cameras. The recording process for animation dialog can be much more carefully controlled, since there is no need for on-site microphones with all of their challenges with noise and synching. The final stages – the mixing of audio and its inevitable translation into the cinema and home environment are faced with the same challenges – whether they are animated or live-action originals. The cinema challenge is about standards of reproduction. The media leaves the studios and is reproduced is a new space – cinemas, homes and whatever else. The creative designers – audio and video – must have faith that their work is accurately represented out in its public form.

This is a very different world than our live audio perspective.  A live show has no requirement to adhere its reproduction to an ongoing standard. If the guy mixing  ZZ Top thinks that he wants some more 400 Hz, then who is to argue with him?  The 80 mic lines coming INTO the mixing console are not a finished product to be shipped to the listeners. The tom drum mic may have severe leakage issues from the snare. Reproducing it “accurately” could be an idiotic idea. The finished product in live sound is inherently – and continuously – a closed loop of self-calibration. The mix engineer constantly readjusts the system like the hydrodynamic stabilizers that continuously keep a cruise ship upright.

Where a standard is applicable in live sound is between the mix position and the audience – and that is where the worlds of live sound and cinema sound meet. In live sound, the self-calibrated mix location meets the audience at the same time, just beyond the security fence. In studio world, the self-calibrated mix position meets the audience in another room, at another time.  Creativity is the king in the mixing space, but objectivity is the only hope for accurately translating that creative undertaking to our listeners –whether it be live or Memorex.

Standards of reproduction

The cinema world has long adhered to standard practices in order for its media to be accurately represented. The audio standards were quite lax historically, but have come great strides in the last 30 years with standards, verification and testing brought to the industry through THX, Dolby and others. It is not my intent to provide a history of cinema sound here – suffice to say that, unlike live sound – we can measure a sound system in a room have a target response that we can see on an analyzer.  The reason is that –unlike live sound – the media presented to the speaker system IS the finished product and therefore can be objectively verified. A reproduction that is more accurate – closer to the standard is objectively superior to one that is less accurate. If there is a peak in the sound system 400 Hz for a live sound system, the mix engineer can – and will – modify the mix to reduce 400 Hz until the offending frequencies are tamed.  If a cinema playback system is left with such a peak, it will be left there for all to hear. If the speaker in the recording room is left with such a peak – the inverse will occur. There is no feedback loop between the recording studio playback monitors and the house speakers. This loop can only be closed by adherence to a standard in both locations.

A simple case in point is subwoofer level. Live engineers consider this a continuously variable creative option. In cinema world the sub level MUST be set objectively or major points of impact will be either over- or under- emphasized.

The SIM3 Analyzer

The folks at Disney Animation have added a SIM3 System to their inventory of acoustic analysis tools. This is an excellent tool for the job of calibration for studio and large spaces – and for verification that such spaces match. My purpose over these two days was to train a group of engineers there how to operate the analyzer and to open their perspectives up to seeing how measurable physical acoustics affects their work and its translation.  The addition of SIM3 opens up a lot of doors for cinema sound. The adherence to standards can stand to be greatly improved by the use of a complex, high resolution FFT analyzer such as SIM.

In the next part I will describe some of the interesting things that came up during our two days there.

Here is a photo of the Disney Animation studios from their web site. Interesting note is that the building was built by McCarthy Construction Company. This was my family’s (I am 5th generation) company and I expected to grow up and join the company. Instead it did neither – blame it on rock and roll. But either way, I guess I would have ended up here!

Latest news………

February 12, 2010

Feb 12:  Added War story #2

Feb 11:  added Part IV to the impulse response saga. That’s all I have to say on that for the moment, pending any comments or questions

School news:  We just added a SIM school on to my schedule: New York City,  May 17-20



February 11, 2010

I started playing guitar at about age 7 or 8. Nobody thought about the fact that I am a lefty. That came up years later. Too late. That’s always been my excuse for why I can’t play – and I am sticking with it.

Ancient history time. So here I am at 16 years old with my wannabe Les Paul custom. When I bought this “les paul” I thought it was the real deal. Instead of a Gibson Les Paul Custom, it was a Custom ( the japanese brand) Les Paul. This was actually a big upgrade AFTER my clear plexiglass guitar.

This is my 1970 Gibson SG Standard. Poor old thing I have had it since 1973 (I was 17). ($235.00) It has been pretty much the only electric solid-body I ever played. Very thin neck the thinnest you will ever find. I replaced the standard tail-piece and tuners – otherwise it would never stay in tune for a second. No need for a whammy bar anyway. The neck is so thin so get vibrato by squeezing your hand.  

This poor old thing needs an extreme makeover. I gutted the electronics ages ago and installed an active preamp inside. That is long gone. Now it is a digital guitar 1or 0. Just an on-off switch.

This is my Taylor G-4. It is a great acoustic guitar. The pickup sound does not thrill me but its pure acoustic tone is warm and full. I did not have a quality acoustic guitar until my wife Merridith said this needed to happen for my 50th b-day. I had played a starter Alvarez with a laminate top. This thing had a warped top that look like a wave pool. Bridge was lifted, neck was bent. Intonation? huh?  I figured it just sounded bad because I can’t play. My wife thought it was the guitar. Now with the Taylor we know the guitar sounds good so it proves I was right.

So here is the real Gibson Les Paul Studio. Got this about 5 years ago. Someday I will find the right thing to do with this guy. I have it set up with George Benson Flat-wound strings right now – 013 or 014 at the top. Strings are stiff as an aircraft cable with 200kg of speakers under it.  Feels like playing a piano (from the inside). Pretty nice for a Benson or Wes kind of sound – and it doesn’t feed back.

If you have bought a copy of my book, you have contributed to this guitar. This was the reward for pushing and pushing to finish the book. This is my SERIOUS guitar, a 1978 vintage Gibson Johnny Smith. This was made during the last years of the Kalamazoo custom shop and is a very pretty piece of work. It is an L-5 body and an ES-400 neck. The pickups are suspended over the body – even the electronics float on the pick guard. The guitar has lots of natural sound – which gives a great tone but is prone to feedback. Someday I will learn how to play well enough to do this guitar justice.

This is the new kid. Just two months old. A Breedlove “Bossa Nova”. This is a nylon string with an internal pickup/mic combination. It has a great acoustic sound but its electronic knock me out. Plug this guy in and its is fat and big. One note never sounded so good. I barely ever played a nylon string before but this has really caught on for me. 

The last one is my Traveler guitar. This is very funny instrument. The tuners are mounted in the middle of the body so that the overall length is reduced. This also protects the tuners so the thing can be stuffed in the overhead of the airplane. Works well enough to take on the road. Hardly ever plug it in but it does the job for drills etc.

Phase Alignment of Subs – Why I don’t use the impulse response

February 8, 2010

OK. The saga continues down below……………………….

What is the best way to phase align our subwoofers to the mains? There is a hint in the way the question was phrased. I didn’t say time align (and it is not because I am afraid of copyright police). I say phase align because that is precisely what we will do. Simply put, you can’t time align a subwoofer to the mains. Why? because your subwoofers are stretched over time – the highest frequencies in your subwoofer can easily be 10-20 ms ahead of the lowest frequencies. Whatever delay time you choose leaves you with a pair of unsettling realities: (a) you are only aligning the timing for a limited ( I repeat LIMITED) frequency range, and (b) you are only aligning the timing for a limited ( I repeat LIMITED) geographical range of the room. So the first thing we need to come to grips is with is the fact that our solution is by no means a global one. There are two decisions to make: what frequency range do we want to optimize for this limited partnership and at what location.

Let’s begin with the frequency range. What makes the most sense to you? 30 Hz (where the subs are soloists) , 100 Hz (where the mains and subs share the load) of 300 Hz (where the mains are soloists)?  This should be obvious.  It should be just as obvious that since we have a moving target in time, that there is not one timing that can fit for all.

Analogy: a 100 car freight crosses the road in front of you. What time did the train cross the road? The answer spans 5 minutes, depending on whether you count the engine, the middle of the train, or the end. Such it is with the question: when does the subwoofer arrive? (and is also true for when does the main arrive?) How do we couple two time-stretched systems together? In this case it is pretty simple. We will couple the subwoofer train right behind the mains. The rear of the mains is 100 Hz and the front of the subs is the same. We will run the systems in series. The critical element is to link them at 100 Hz. (I am using 100 Hz as an example – this can, and will vary depending upon your particular system).

The procedure is simple. measure them both individually, view the phase and adjust the delay until they match. You have to figure out who is first and then delay the leader to meet the late speaker. This will depend upon your speaker and mic placement. I say this is simple – but in reality , it is quite difficult to see the phase response down here. Reflections corrupt the data – it is a real challenge.  Nonetheless, it can be done. It’s just a pain.

When I get a moment I will post up some pics to show a sub phase-align in the field. 

Wouldn’t it be nice if there was a simpler method? Like using the impulse response to get a nice simple answer directly in milleseconds, instead having to watch the fuzzy phase trace.  It is absolutely true that the impulse response method is easier.  In my next post I will explain why the easy way lacks sufficient accuracy for me to ever use with a client.

******************  Part II *****************************************

FFT measurement questions and answers

The first thing to understand about an impulse response is that it is a hypothetical construct. This could, to some extent, also be said about our phase and amplitude measurements, but it is much more apparent – and relevant with an impulse response.

The response on our analyzer is always an answer to a question. The amplitude response answers the question: What would be the level over frequency if we put in a signal that was flat over frequency. This is not hard to get our heads around. If we actually put in a flat signal (pink noise) we would see the response directly in a single channel. If not, we can use two channels and see the same thing as a transfer function. This makes it a hypothetical question- what would the response be with a flat signal – even if we use something like music.

Same story with phase but this gets more complex. Seen any excitation signals with a flat amplitude AND phase response?  You won’t find that in your pink noise. Pink noise achieves its flat amplitude response only by averaging over time. Random works for amplitude – but random phase – yikes – this will not get us any firm answers. In the case of phase we need to go to the transfer function hypothetical to get an answer – the phase response AS IF we sent a signal with flat phase in it. Still the answer is clear: this is what the system under test will do to the phase response over frequency.

Impulse response

The impulse response display on our FFT analyzer answers this question: what would be the amplitude vs. time response of the system under test IF the input signal was a “perfect” impulse.  Ok……….. so what is a perfect impulse?  A waveform with flat amplitude AND phase. That can’t be the pink noise described earlier, because pink noise has random phase. So what is it?  A single cycle of every frequency, all beginning at the same time. Ready set, GO, and all frequencies make a single round tripand stop. They all start together, the highest freq finishes first, and the lowest finishes last. If you looked at this on an oscilloscope (amp vs time) you would see the waveform rise vertically from a flat horizontal line, go to its peak and then return back to where is started.

IF the “perfect” impulse is perfectly reproduced it will rise and fall as a single straight line. The width of the line (in time) will relate to the HF limits of the system. The greater the HF extension, the thinner the impulse. As the HF range diminishes, the shortest round trip takes more time, and as a result the width of the impulse response thickens as the rise and fall reflect the longer timing. A system with a flat phase response has a single perfect rise and fall in its impulse response and a VERY important thing can be said about it: a single value of time can be attributed to it. The train arrives a 12:00 pm. All of it. 

The impulse response on the FFT analyzer is not an oscilloscope. We do not have to put in a perfect impulse. We will use a second generation transfer function, the inverse Fourier transform (IFT) , which is derived from the xfr frequency and phase responses. This is the answer to the hypothetical question: what would the amplitude vs time response be IF the system were excited by a perfect impulse. 

If the system under test does not reproduce the signal in time at all frequencies, then the impulse response shape will be modified. Any system that does NOT have a flat and amplitude and phase response will see its impulse response begin to be misshapen. Stretching and ringing, undershoot and overshoot will appear around the vertical peak. Once we are resigned to a non-flat phase response we must come to grips with the fact that a single time value can NOT describe the system. The system is stretched. The time is stretched. The impulse is stretched.

This is where the FFT impulse response can be misleading. We can easily see a high-point on the impulse response, even one that is highly stretched. Our eyes are naturally drawn to the peak – and most FFT analyzers automatically will have their cursors track the peak – and lead us to a simple answer like 22.4 ms, for something that is stretched 10ms either side of that. And here is where we can really get into trouble: we can nudge the analyzer around to get a variety of answers to the same question (e.g. the same speaker) by deciding how we want to filter time and frequency: ALL OF WHICH ARE POTENTIALLY MISLEADING BECAUSE NO SINGLE TIME VALUE CAN DESCRIBE A STRETCHED FUNCTION.

Did I mention that all speakers (as currently known to me) are time stretched?  So this means something pretty important. The simplistic single number derived from an impulse response can not be used to describe ANY speaker known (to me) especially a subwoofer.

Does a stretched impulse response tell you what frequencies are leading, and by how much? Good luck.  You would have a better chance decoding a German Enigma machine than divining the timing response over frequency out of the impulse. This brings us back to the heart of the problem with our original mission: we are trying to link the low frequencies of the main speaker (100 Hz) to the high frequencies of the subwoofer (100 Hz). The peaks of these two respective impulse responses are in totally different worlds. They are both strongly prejudiced toward the HF ranges of their particular devices which means the readings are likely to be the timings of 10 kHz and 100 Hz respectively. 

Simple answers for complex functions. Not so good.  That’s it for the moment. Next I will describe some of the different ways that impulse responses can be manipulated to give different answers and when and where the impulse response can provide an accurate means of setting delays.

********************** Part III ***********************************************

The linear basis  of the impulse response

 Those of us using the modern FFT analyzers that are purpose-built for pro (and amateur) audio have been spoiled. We have grown so accustomed to looking at a 24 or 48 point/octave frequency response display that we forget that this is NOT derived from logarithmic math. The FFT analyzer can only compute the frequency resp0nse in linear form. The quasi-log display we see is a patchwork of 8 or so linear computations put together into one (almost) seamless picture.  Underlying this is the fact that the composite picture is made up of a sequence of DIFFERENT time record lengths. Bear in mind that the editing room floor of our FFT analyzer is littered with unused portions of frequency data. We have clipped and saved only about half the freq response data from any of the individual time records.

How does this apply to the impulse response? Very big. The impulse response is derived from the transfer function frequency response (amp and phase). It is a 2nd generation product of the linear math. The IR is computed from a single frequency response – from a single time record – which means it comes from LINEAR frequency response data.  The inverse fourier transform (IFT) cannot be derived from the disected and combined slices  we use for the freq response. The IR cannot contain equal amounts of data taken from a 640 ms, 320 ms, 160 ms…. and so on down to 5ms  to derive it response. Think it through……… there is a time axis on the graph. It has to come from a single time event.

The IR we see comes from a single LINEAR transform. The importance is this: linear data favors the HF response. If you have 1000 data points, 500 of them are the top octave, 250 the next one down and so on. This means that our IR peak – where the “official” time will be found, is weighted in favor of the highest octave. If you have a leading tweeter, The IR will find it ahead of the pack (in time and level). The mids and lows will appear as lumpy foothills behind (to the right) of the Matterhorn peak. If you have a lagging tweeter, the IR will show the lumpy foothills ahead of the peak (to the left), but the peak will still be the highest point.  Our peak-finding function will still be drawn to the same point – the peak.

Now consider a comparison of arrival between two speakers – if they both extend out to 16 kHz (mains and delays) then the prejudice of the IR in favor of the HF response evens out. If we find the arrival time for both we can lock them together. Their response will be in phase at 16 kHz and remain in phase as we go down – (TO THE EXTENT THAT THE TWO SPEAKER MODELS ARE PHASE MATCHED).  This is a PARALLEL operation. 10kHz is linked to 10 kHz and 1k to 1k and 100 to 100 for as long as they share their range. If the speakers are compatible, one size fits all and the limitations of the IR are even on both sides of the equation. If they are not compatible over frquency, we will need to see the PHASE response to see where they diverge, and solutions enacted within this viewpoint. – later on that.

Now back to the subs…………

It should be clear now that the linear favoritism over frequency will NOT play out evenly in joining a sub to a main speaker. This is also true of aligning a woofer and tweeter in a two-way box. This problem holds for ANY sprectral crossover tuning. Linear frequency math does not have a and fair and balanced perspective over frequency. If you are looking at devices with different ranges they are subject to this distortion.  The location of the peak found in our IR is subject to the linear focus. If the main speaker is flat the peak will be found where there are more data points: the top end – 4 to 16 kHz. All other freq ranges with appear RELATIVE (leading or lagging) to this range. If you have a speaker that is similar to 100% of the speakers I have measured in the last 26 years, then one thing is certain: the response at 100 Hz is SUBSTANTIALLY behind the response we just found at 8 kHz.

The sub is NOT flat (duh!!) so there is a tradeoff game that goes on in the analyzer. As we lose energy (frequency rising) we gain data points (liner acquisition)  so the most likely place the peak will be found is in the upper areas of the subwoofer range and/or somewhat beyond,  before it has been too steeply attenuated.  If you have a subwoofer that is similar to 100% of the speakers I have measured in the last 26 years, then one thing is certain: the response at 30 Hz is SUBSTANTIALLY behind the response we just found at its upper region.

One of the reasons I have heard given as the reason to use the IR values alone to tune sprectral crossovers (subs+mains, or woofer+tweeter) is that the IR gives us “the bulk of the energy” for each driver and aligning “bulk of the energy1+bulk of energy2 = maximum bulk of energy.”  Sounds good in text. But it does NOT work that way. You are making a series connection at a specific freq range, not a parellel connection (where bulk might apply). Futhermore, the bulk formula is flawed anyway – because the linear freq nature if the IR means that the two “bulks” are weighed differently.

********************** Part  IV ******************

There are a variety of ways to compute an impulse response on an FFT analyzer. All of them haqve an effect on the shape of the response, how high the peak goes, and where (in time) the peak is found. Without going hard into the math we can look at the most decisive parameters.

VERY SIMPLIFIED IR Computation Features

1) The length of time included after time zero (the direct sound), in seconds, milliseconds etc.:  This differs from the the actual time record captured, since there is positive and negative time around time zero – but the math there is not important . In the end we have a span of time included in the computation.  This puts and end-stop on our display – we can’t see a 200ms reflection if we have only 100ms of data after the direct sound. We could, however choose to display less than the full amount of data we have. The visual may be a cropped version of the computation, or it could be the full length.  The capture time also limits how low we can go in frequency. We can’t see 30 Hz if we only have 10ms of data. Most IR response have the option of large amounts of time, so getting low frequencies included will not be a big issue. The fact that the frequency response is LINEAR means that frequency weighting favors the HF – no matter how long – or short our capture is.

2) Time increments/FFT resolution/sample freq:  How fine do we slice the response in time. The finer the slices, the  more detail we will see. More slices = higher frequencies. If we have slice it into .02 ms increments (50 kHz sample rate) we can see up to 25 kHz. If we slice at lower sample rates, the frequency range goes down. The same speaker, measured over the same amount of time, with different sample rates/time increments will include different frequeny ranges – and therefore MOST LIKELY will have its impulse peak centered at a different time. This is important. The speaker did not change, but our conclusions about it did. This is a non-issue if we are comparing two speakers that each cover the same range – they would both have the same shift applied to them. But if we have one speaker with a full HF range and one without the playing field just got tilted. If one speaker really has no HF, and the other one does – but it is filtered by the anaylzer, then we can assume that synchronizing the two peaks will put the speaker in phase.

Vertical scale: Linear/Log:  The uncultured version of the IR is linear in time, freq and level. This means that things that go negative will peak downward while positive movement goes upward. Polarity (and its inversion) can be seen. The down side of this is that the linear vertical scaling translating vewry poorly visually toward seeing the details of the IR such as late arrivals, reflections, etc. Worse yet is trying to discern level differences in linear. The Y axis does not read in dB. It reads in a ratio and this has to be converted. Upward peaks have a positve value and downward have a negative value. The strength of an echo can be computed by the ratio of the direct to the echo levels – and converted by the log 20 formula into dB. Where it strange is when you try to compute positive direct sound to a negative going reflection.

The log version is obtained by the Hilbert transform and shows the vertical scale in dB. But the downside is that there isn’t a downside. Pun intended. What I mean is that the negative side of the impulse is folded over with the positive and these are combined into a single log value. This can now be displayed in dB since everything is going one way. This has various names: Energy-time-curve (ETC) amoung others. The visual display is blind to the polarity but I am told by sam Berkow that the cursor in SMAART shows whether or not the energy is positive or negative – even though it all displays positive.


So once again we are back to the same place. If you are going to use the impulse response alone (I say you because it will not be me) to align speakers in different freq ranges you are prone to computational items that will affect the HF and LF sides of the equation differrently. One technique I have seen advocated is the push down the sample freq so low that the upper regions of the HF speaker are filtered out. The idea is this: if the Xover is 100 Hz, then drop the resolution of the analyzer down to filter out the region above 100 in the HF speaker. Then we will see the impulse response at 100 Hz of BOTH speakers – and VOILA we have aour simple answer.  BUT – one impulse response (the HF) has filtered the device by computation – the other (the LF) is filtered by a filter. We have a merger of  the VIRTUAL – a computationally created phase shift and freq response filtering (which we don’t hear) with an actual – the filter response of the Xover.  It is possible that the value for the impulse will give the correct reading so that the Xover is actually in phase – possible – not probable – but we won’t know until we measure the phase – which is the whole point of this exercize.

Simply put: why bother with a step-saving solution ( Xover alignment by IR)  if it is so prone to error that you have to do the second step (Xover alignment by phase) any way? If a step is to be skipped it is the IR – not the phase.


Some updates and future topics

February 3, 2010

I added some graphics to the discussion on Cardioid subwoofers, and ABC of line array tuning. I should have some graphics up for the coherence discussion later this week. I am also in the middle of an old war story. 

I was thinking about a few future topics to meander on about:

1) Why I don’t use the impulse response to set the timing for subwoofers

2) The emotional baggage of equalization

3) what topics I cover in my seminars

Whenever I am away from my computer I think of great topics then………………poof

Any suggestion?