INTRODUCTION
This article presents an audio-visual exploration of various phenomena observed whilst investigating time domain shifts on individual signals in multi microphone recordings. In particular, it demonstrates aurally for the first time, the effect of the author’s: “Set Phasors to Stun”: An algorithm to improve phase coherence on transients in multi microphone recordings, [1] presented at the ICA2007 in Madrid. In this paper, the author formed a simple algorithm to enhance transients in multi-microphone recordings and developed a Max/MSP patch to implement this. The patch was capable of automatically detecting sample offsets deemed to result in maximum resultant amplitude when comparing two or more signals, thus indicating when sounds were most “in phase” according to which range of the audio was being interrogated. The focus of that paper was an examination of transients, and although not the primary thrust of the investigation, it was noticed that considerable low frequency enhancement was feasible through time domain manipulations. This will now form a considerable part of the investigation herein.
Whereas a full paper might focus principally on text-based process and conclusions, this multimedia on-line article will allow the reader to hear timbral differences introduced by the patch. Various examples can be heard and in some cases, spectrograms examined simultaneously to allow the reader to form independent conclusions and ascertain appropriateness (and usefulness) of the system. Most excerpts will also feature video capture of the software to allow the reader to reinforce aural perception whilst viewing typical interventions.
The examples presented here represent a range of recording applications, with a special emphasis on recordings produced by Pip Williams who is interviewed elsewhere in this issue. Some of Pip’s recordings both for the current “Status Quo” album and his orchestral recordings for “Nightwish” will be explored. The latter’s ambient environment pushes the algorithm to its limit.
It has long been understood that in a multi-microphone recording, due to the different lengths of acoustic path from individual point sources to the various microphones, phase colouration will be induced in the resultant signal. At present, there is no known way of alleviating this, and indeed it has become an accepted part of the sound engineer’s art to accept this and indeed harness it to creative effect. It is also commonplace to attempt temporal manipulations of various forms to counter it, but such approaches all currently involve adjustments made through aural judgement or relatively crude and laborious calculations.
Each and every frequency component of a sound will have a different wavelength and so exhibit a different phase as perceived by a given microphone at a given distance from the source. When multiple (two or greater) microphones are present, the distance of the sound source will more than likely be at least slightly different from each, and so interference will produce some colouration of the resultant sound through comb filtering.
There now follows a section of “Set Phasors to Stun” [1] to assist the reader in understanding the subsequent examples and the background of the concept.
‘Phase is related to distance, differently for each frequency component of any given sound, and so unless the point source was a sine wave emanating equidistantly from each microphone diaphragm, interference is going to produce some colouration of the recorded sound through comb filtering.
“The effect of having the same time delay for all frequencies creates a different phase delay for different frequencies, because different frequencies have different wavelengths. For example, a time delay of 1/1000 of a second would cause a 360 ° phase difference) in a 1000-Hz wave, but only a 180 ° phase difference for a 500-Hz wave. Thus a cancellation would occur at 500 Hz but not at 1000 Hz. This would correspond to a dip at 500 Hz in the, frequency spectrum of the signal.
The phase lag f, for any frequency f, for any given time delay t, may be given by Equation [1]:
When two identical waves separated by a phase lag are added together, their sum is a wave whose amplitude depends on the phase lag.”[2]
When recording, this colouration was typically controlled by the physical adjustment of microphone placement. The audio spectrum contains wavelengths of approximately 17m to 1.7cm, and so adjustments to the order of less than a centimetre to a few centimetres would have a profound effect on the perception of the upper frequencies that were present. The effect was often associated (even attributed) to natural room ambience and utilised in recordings. Some purists found this “phasiness” undesirable and sought microphone placements that minimised it, often as a primary strategy when positioning these microphones. The phenomenon is most apparent when stereo recordings are summed to mono.r
Much contemporary analysis of musical sounds has been based around windowing and phase vocoding techniques. Duxbury et al [7] showed that separation of transients and steady state audio facilitated numerous applications such as transient enhancement and superior time stretching. Such an approach endeavours to maintain the integrity of the perceived musical information.
This paper takes a much simpler approach, closer to that of typical manipulations the record producer might perform with short delay lines when dealing with phase anomalies embedded in a recording. A device created in Max/MSP that analyses transients and automatically applies the (subjectively) appropriate delay purely in the time domain is presented. This device is analogous to adjusting microphone position (distance from source), and provides the producer with a software environment to tighten transients in a multi-microphone recording, allowing the frequency dependant artefacts associated with such adjustments to still occur naturally in the steady state portion of the sound. A corollary of this is that the device can also be used to impose tone colour on the steady state portion if preferred.
“Because two microphones separated in space pick up a sound at slightly different times, their combined output will be similar to the single microphone with delayed reflections. Therefore, spaced microphone stereo-pickup arrangements are susceptible to comb-filter problems. Under certain conditions the combing is audible, imparting a phasiness to the overall sound reproduction, interpreted by some as room ambience. It is not ambience, however, but distortion of the time and intensity cues presented to the microphones. It is evident that some people find this distortion pleasing, so spaced microphone pickups are favored by many producers and listeners.” [3]
It is in this spirit in which this work is pursued.
THE ALOGRITHM
Fig. 1 represents the algorithm hypothesised to implement this theory.
Fig. 1Functionality:
Channels 1/2 Audio Playback:
“Channel 1 Audio Playback” is a buffer that holds a pre-recorded audio file from one of the microphones in the shared acoustic space. “Channel 2 Audio Playback” is that of another. Each of these is routed through its own sample accurate delay line, the values of which can be set/reset independently. The user can optionally configure the two buffers as a stereo pair, and a dedicated stereo delay line can act on them as a unit. This configuration is iterated N times to accommodate many simultaneous audio signals, allowing various permutations of stereo and mono channels.
2N:2 Routing Matrix:
This simply allows any channel (or pair of channels) to be fed into the subsequent section for comparison against any other channel (or pair of channels).
Iterative Gain Comparator:
This section takes any two signals determined by the Routing Matrix and performs a summation of peaks on each sample. This works on the premise that in a shared acoustic space there will likely be some microphone spillage and so the waveforms of each channel will have similar shapes, albeit with different amplitudes. Paterson [8] said
It has long been established that the generalised cross-correlation (GCC) method can estimate the delay present between two sensors [3], however this was shown not to work well in a reverberant environment [4]. This is a good reason to adopt this lateral approach.
“In the reverberant environments, the performances of the conventional Time delay Estimator (TDE) methods are degraded due to interference and reverberation [4]. The main reason is the disagreement between the ideal propagation model and the real signal model in reverberation [4][5].
Therefore, the TDE for the MA system should take account of the room transfer function (RTF) that models the room reverberation [5]. “ [6]
The work of [6] is primarily aimed at adaptive microphone arrays, but it would seem to imply that current technology has not yet addressed the simple convergence of transients, instead the focus being on Fourier based spectral analysis.
Fig. 2
The Iterative Gain Comparator comprises the:
Maximum Amplitude GaIn Comparator (MAGIC):
Each of the two audio channels’ waveforms are monitored for the largest resultant peak when summed. As shown in Fig. 2, one channel’s time base is iteratively shifted by a single sample, and after each time shift, such a summation is performed. Maximum coherence of the transient peak is detected by recording the time shift that produces the maximum instantaneous gain when the channels are summed. Computational efficiency is maintained by simple peak detection in the audio domain so computationally intensive convolution is not required.
Time Inc:
A simple incremental counter that increases the delay time in “Test Delay” below.
Test Delay:
This is the delay that shifts the time base iteratively by a single sample as described above. The user must monitor the (numerical) display meters on the GUI, and when a peak is detected, the test delay can be locked and the user can “A-B audition” the effect. The delay value can be then written to the appropriate delay associated with an audio channel or stereo pair of audio channels. Once written, the target audio channel can be locked and the “Test Delay” applied to another channel etc..
Monitor Output:
Monitoring was done in Mono to ensure accurate appraisal of comb filtering and any other artifacts.It should be noted that each “Delay” features the ability for the user to manually adjust the delay time, lock this and compare with the idealized and zero delay. This is discussed further in section 8, Evaluation below.
In addition, each audio channel features the ability to export the sample accurate delayed audio as a fresh audio file so that further music production can be pursued natively in the DAW of the user’s choice once the desired amount of transient enhancement is achieved.'[1]
2 THE MAX PATCH
The Algorithm was implemented in a Max/MSP patch.
Fig. 3 shows an overview of the Patch. The area on the left remains in view, but the larger window in the right is horizontally scrollable to reveal the grouped delays. The window on the left allows control of overall gain, phase, which slot receives the test delay, and offers tools such as a sonogram and a peak level comparator. It also has meters which show the peak level for overall output should the active channels be either in phase or our of phase (simultaneously), as well as numerical readings for delay in samples, milliseconds and delay locking information. The horizontal channels (on the right) are identical in the fashion of a mixing console.
Fig.4
The upper slot is designated “Master” and it is on this slot that regions can be selected for looping and audition. One such region can be seen in the bluish area [a] in the centre of the waveform display. The “Master” slot happens to be a darker shade since it is currently selected and is being delayed by the “Test Delay”. The manual delays can be set with the faders [b]. Each channel has a gain control [c], and various options for subjective A-B comparison between zero, manual and automated delay times can be seen in the panels indicated by [d]. The stereo pair delay control area is given by [e]. The button indicated by [f] allows a new version of the audio to be written to the hard disk. This audio features the exact numbers of samples delay prepended as digital silence so that this file might be imported into a commercial DAW carrying with it the effect of the patch’s delay.
3 TEST RESULTS
There now follows a number of demonstrations of the patch in action.
3.1 TESTING
Paterson [8] first demonstrated elements of the following at the ARP conference in Edinburgh 2006. The Amplitude Gain Comparator engine in the patch featured here is derived from the device shown in Edinburgh.
In the AV examples, all (cumulative) delays are A-B-ed with the original state of the recording. The reader should observe the delay times (in samples) in certain screen grabs and on others, simply the red LED indicating that delays are switched in. The framing of the movies is varied, simply to display only relevant bits of the patch.
3.1.1 Programmed Loop
A simple drum loop was programmed in Logic. It was then copied, a cut was made, and the section of audio after the cut, shifted (time-delayed) by a very small random amount. This second version was then rendered to a contiguous new file. These files represent an idealised scenario for testing of delay compensation to verify accuracy. Each has an identical waveform, except that one has a portion occurring later than the other. See Fig. 5.
The two audio files were loaded into the Max patch and a transient was selected. The patch was set to calculate the delay in order to verify its accuracy. Audio was monitored with the two files out of phase so that the user could hear the null point more clearly. The patch detected that a delay of 24 samples would achieve maximum coherence. When the delay was locked and then switched in and out, the audio toggled between just a looping click and the selected audio.
Movie 1
The selected loop was then extended to the entire one bar cycle of audio and with no delay applied, phase inverted playback revealed that the initial segment of the resultant was totally silent, but as expected the time-shifted section was audible with a slight phasiness. When the delay was switched in, this was reversed, with only the first beat being audible. Movie 1 demonstrates this.
It was also found that shifting the delay by a single sample in either direction allowed the silent sections to be clearly heard- clearly at 44.1kHz, a single sample error stops complete cancellation. This indicated that the patch was indeed capable of sample accurate phase inversion when confronted with theoretically ideal waveforms. Note also the meters on the left of the patch which indicate the resultant amplitude when both in phase and out off phase versions are measured simultaneously. This is independent of the audio monitoring.
Anazawa and Takahashi [12] suggested that when recording an orchestra, spot microphones must be delayed to an accuracy of better than 1ms (although they were working to a sample accuracy of 21 microseconds, and mention using a resolution of 300 microseconds). The finding of the test scenario here suggests that 1ms is not enough, and 22.7 microseconds (one sample at 44.1kHz) proves significant.
3.1.3 Real Tom-Tom
In order to evaluate the effectiveness of the algorithm in a real world acoustic recording with different phases and ambiences, a tom-tom was mic’ed with two sE Titan’s, deliberately placed at different distances to induce a random phase difference. See Fig. 6.
Fig. 6
The patch detected that a delay of 125 samples would produce an optimum amplitude on a transient. In Movie 2, the seek can be heard at high speed on a small portion of the transient, and then the entire tom hit is looped and the overall effect is apparent. Notice also that a slight flam can be detected on the un-delayed version. This is induced purely by the physical difference in path length of the microphones.
Movie 2
The original recording was noted to have a phase colouration. When the delay was invoked however, there was an obvious clarity and enhancement of the transient.
Although this is an extreme example (in terms of microphone placement for a single sound source), it does demonstrate that the patch worked in the real world and in addition could produce interesting effects in spatial separation comparable to drum kit mic’ing practice.
Movie 3
In a more music orientated loop situation such as Movie 3, the delay appears to introduce more punch and fullness. The flam mentioned above appears harder to resolve aurally.
3.1.4 Upright Bass
It was now decided to investigate a pitched instrument. A monotimbral source was desired to simplify harmonic content, but a real world multi-microphone recording methodology was desired. The author produced a jazz album which featured an upright bass recorded with a sE Gemini, a Calrec 1050C and a contact microphone, as seen in Fig.7.
Fig.7
The three microphones had been chosen and placed according to the preference of the hugely experienced engineer, Paul Borg (right). He based these microphone placements on previous successful results and achieved an excellent composite sound in the control room. The Contact Microphone was clearly the “nearest” to the sound source. An iterative process was applied. Firstly, a transient was looped. With the Gemini muted, the Calrec was delayed relative to the Contact, and this delay was locked. The Gemini was then unmuted and it was locked to the Contact/Calrec pair with its own independently calculated delay. The selected transient can be heard with and without the delays in Movie 4.
Movie 4
The loop was then extended to appraise the effect on a musical phrase as can be heard in Movie 5.
Movie 5
Fig.8 shows a sonogram image with a (logarithmic) range of 50Hz-1500Hz, for a looped one bar phrase containing several different pitches and articulations. The increase in low-end energy can be clearly seen as the delays were switched in and out. [8]
The transients seemed more pronounced, and the whole result seemed to benefit from the enhanced bottom end. One anomaly was that when different notes were selected as looping references, the recommended delay which was reported was different, typically within a range of +/- 20 samples. This is assumed to be the influence of the fundamental frequency of a given pitch, which will contain the most energy and hence imply the largest resultant amplitude. This is clearly a basic limitation of the system.
In a separate test, the manual delay controls proved most useful too, allowing both subtle and dramatic timbral changes, certainly facilitating the user’s “taste”.
3.1.5 Acoustic Guitar
An acoustic guitar was recorded with two Rode NT2 microphones. The placement of the microphones was deliberately crude in order to explore how the patch might deal with a polyphonic source in extreme circumstances. Fig. 9 shows the best visual indication of the time delay; Max/MSP does not facilitate easy quantitative measurement, but the delay between the left and right channels can be seen to be approximately 132 samples. Again the sonic effect was dramatic, with a large and full sounding low-frequency enhancement with stronger transients. The peak levels rose by 3.5dB. The low frequency boost here might be considered too much for most applications, but when monitored in stereo, the original recording had more integrity than it appeared in mono the time shifted one appeared richer and tighter at the expense of a slight “phasiness”.
Fig. 9
Fig.10 again shows the low frequency enhancement when switching the delay on and off, although this time the bandwidth shown is approximately 12kHz.
Fig. 10
Movie 6a
Movie 6b
The “prescribed delay” of Movie 6a might produce too much bottom end for many applications and also induces some phasing, although naturally this could be rolled off with subtractive EQ. Manual adjustment of the delay (as featured in Movie 6b) however provided a number of interesting textures, often featuring the “classic phasiness effect”.
4 Status Quo Drum Kit
The process was now iteratively applied to the multiple microphones of a drum kit. A fully mic’ed drum kit presents a hugely complex set of phase relationships, and after applying a well established set of rules and preferences, it is ultimately the skill of the sound engineer which defines the ultimate detail of the microphone placements.
The excerpt used for investigation was recorded by Gergg Jackman and Pip Williams for the Status Quo album, “In Search of the Fourth Chord”. These gentlemen are hugely capable and experienced, and it was regarded as quite a test of man versus machine to interfere with their work.
The following techniques were performed after completion of the album purely in a test scenario.
The microphones were a spaced pair of AKG 414s on cardioid with no bass pad for overheads, a pair of Neumann U87s on cardioid about 2.5 meters in front of the kit for ambience (about 2.5 meters high) a Shure SM57 on top of the snare and an Electrovoice RE20 on the bass drum. Other close microphones (eg Toms) were present, but were not considered in this paper.
In the case of the Upright Bass (see 3.1.4) when the contact microphone provided the “timing reference”. It was decided to focus on the snare drum here. The rationale was twofold; it is perhaps the most important component of a drum kit (for backbeat orientated music), and it was physically placed in the middle of the microphone array, so phase errors induced in other elements of the kit would be minimised.
4.1 Overheads
A snare hit was selected. The overheads were visually inspected, and the earliest one ascertained. The zoom view can be seen in Fig. 11. The scale is in samples.
Fig. 11
The MAGIC algorithm was allowed to determine the optimum delay: Movie 7.
Movie 7
With the delay applied, the drum sounded more focussed and the transient was enhanced. There appeared to be a slight pitch shift artefact.
4.2 The Room Microphones
The overheads were then muted, and the ambient pair was considered. As can be heard in Movie 8, when time corrected similarly, the two ambient microphones showed similar qualities to the overheads. The actual ambience appeared slightly reduced which would imply that some of what had been originally favoured as desirable natural ambience was in fact comb-filtering. This is a common and often aesthetically pleasing feature of stereo recording.
Movie 8
4.3 Both Pairs
Fig. 12 shows the original timing differences of the two pairs. The upper two are the overheads, and the lower two are the ambients.
When the two locked pairs were monitored together, but as yet un-matched, there was a most pronounced difference in the sound. It was now possible to perceive a smearing of the snare transient in the un-delayed version. [A “side-effect” (not in this audio excerpt) was that the bass drum appeared lower in pitch, but not as full sounding. The ride cymbal had lost some of its sustain, and a slight flam was perceptible.] With the MAGIC-detected delay switched in, the snare displayed a more pronounced transient with a punchier tone: Movie 9.
4.4 Including the Snare Drum with both Pairs
The snare drum close mic was now included and delayed against the now locked two pairs. As can be heard in Movie 10 (The room is delayed by 125 samples relative to the overheads, with each pair locked with LR offsets, and the snare close microphone delayed by 66 samples), there seemed to be much more focussed body to the sound and a large increase in perception of the transient.
Movie 10
Other parts of the kit seem to benefit from the set of delays in Movie 10. The bass drum appeared “fatter”. In Movie 11, the Bass Drum microphone is muted, but the effect of the earlier delays can be heard (from the spillage on the other mics) if a Bass Drum hit is looped.
Movie 11
4.5 Adding the Bass Drum
The technique changed in order to optimise the Bass Drum. The loop had until now been solely on a Snare hit. The rationale was that it was now desired to treat the Bass Drum, and to run the MAGIC on the snare signal that had spilt into the Bass Drum microphone might not achieve this optimally, and so the dedicated Bass Drum signal was compared with all other channels as already optimised for the snare. As can be heard in Movie 12, this gave an effect of adding more body and transient to the Bass Drum, although with a clear pitch change artefact.
Movie 12
4.6 The Whole Kit
Movie 13
Movie 13: this example features a one bar phrase with both kick and snare. The ride can now also be heard, and should be considered. It is arguably harder to perceive the changes when listening to all three components together, but there is still a noticeable change; the reader is invited to appraise it his or her self….
It was the author’s view that the modified recording had more power and focus, although naturally such a view is subjective and contextual to the accompanying music.
The sonogram did not yield interesting results, primarily due to the noise like quality of drums and cymbals.
5 Application to Orchestral Recording: the addition of ambience
5.1 Background
Perhaps the ultimate challenge for the system is the analysis and processing of truly ambient recordings as typified in the contemporary orchestral techniques employed by Pip Williams and Hayden Bendall in the recording of the orchestral accompaniments for “Dark Passion Play”, an album by Finnish rock band “Nightwish”. These were recorded at Abbey Road Studio One, in December 2006 and February 2007. Details of the orchestra are given in Appendix A.
Whereas the delay compensations previously calculated were mostly based upon relatively dry signals, orchestral recordings contain a large proportion of natural reverberation. These reflections interfere and complicate the relatively similar “dry” signals that each microphone might have captured. This reduces the efficiency of the MAGIC processor to accurately determine true source peak maximums.
The capture of reverberation is one of the most crucial components in any ambient recording. One classic example might be Ralph Vaughan Williams’ Fantasia on a theme by Thomas Tallis, by the Sinfonia of London conducted by Sir John Barbirolli.
… the Fantasia is solemn and ecclesiastical in mood and belongs to the acoustics of a cathedral such as Gloucester, where it was first performed. Barbirolli’s 1962 recording (his second; the first was in 1946) was made in the Temple church, part of the reason for its success. This venue was suggested by the American composer and conductor Bernard Hermann, a friend and admirer of Barbirolli since Sir John’s years in New York. ‘There we went for a recording session that started at midnight to avoid any traffic noises,’ Ursula Vaughan Williams wrote later. ‘Coats and Thermos flasks were pile round the effigies of Crusader knights. Bernard was there, listening to the balance and to the music, and the resulting record is by far the best ever made of this work.’ [14]
This ambience is inextricable from the performance since the conductor and players react to it either consciously or subconsciously. Ambience has helped define many a performance. Accurate evaluation of the signals however, could only be performed if the ambience were effectively removed. Many desirable manipulations might be achieved in the above orchestral scenario, since the (ideally) anechoic extractions could be accurately compared and processed in order to achieve an optimum coherence. Obviously, the end result would feature this essential ambience returned indiscernibly in its original state. This is not currently possible with existing technology. Much work has been done on source separation, an approach that facilitates extraction of individual instrumental parts from (say) stereo recordings or mixdowns.
Woodruff et al [10] declare:
The difficulty of the source separation problem depends on the number of sources (instruments) in the recording and the number of sensors (microphones) used to make the recording. When the number of available audio channels (mixtures) equals or exceeds the number of individual sources (a quadraphonic recording of a trio, for example), one may use Independent component analysis (ICA) [9] The source separation problem is considered degenerate, or under determined, when the number of sources exceeds the number of mixtures. Standard ICA algorithms are not effective in the degenerate case.
In addition, Viste and Evangelista [11] demonstrated iterative source separation by minimizing the variance of the temporal envelopes of each source’s individual harmonics. This approach can still yield results with reverberant recordings, although it cannot work in so-called degenerative cases.
Paterson [8] proposed an algorithm to attenuate microphone spillage, which if implemented could yield the dry(er) signal required for other approaches to be successful, although a fundamental limitation of this approach is that it is based upon a point source generation of an impulse response. Whilst potentially adequate for working with a small number of spatially separated instruments, the physical size of an orchestra is likely to render this approach inadequate since there are so many acoustic paths to any given microphone.
6 The Nightwish Orchestral Sessions
The complications and limitations described in Section 5 are largely ignored here, and instead a series of experiments were performed with available audio signals from individual microphones. See Appendix B for details of the extensive microphone setup employed in the sessions. The orchestra can be seen in Fig. 13.
Phase cancellation and subsequent comb filtering is an inevitability of any multi-microphone recording and the appraisal of this is of course wholly subjective. When considering placement and balance, a good engineer will find an optimum within the myriad of evils, but once defined and presented to the public in the from of a mixdown, only that optimum can be considered. These experiments allow the reader to explore some of the decisions and trade-offs that the engineer will (instinctively) consider when forming that optimum, together with some more lateral examples.
What follows are a number of excerpts comparing the effects of various temporal manipulations. Several microphone combinations were subjectively compared to consider the effect of using the Max patch for both subjective “phase locking” and manual intervention.
Fig. 13
6.1 Method
As a test-bed, the track “The Islander” was used. The assumption was that each track would be acoustically similar and was recorded in the same fashion.
Typically, a short passage of music was selected for looping and the patch asked to calculate any maximums. Once it locked, the sonic results are given accompanied by video footage to assist the reader in understanding what is happening; usually toggling the decreed delay(s) on and off. In some instances, a sonogram is also displayed. This passage might be a single note or even, or perhaps when delays were established, a longer musical phrase is offered featuring the same delay. Various combinations of microphones are explored.
Please note that in some examples, looping clicks can be heard, and that there are some discrepancies with precise AV synchronisation due to the video compression (to Flash) necessary for web distribution. Obviously, stereo end results would be derived from the source signals, but these are ignored in this text in order to maximise perception of low level phase relationships and colouration, and so all excerpts are monophonic. Due to streaming optimisation, the audio format is mp3, 192kbit/s, 44.1kHz. The patch was actually operating with 24bit/44.1kHz Aiffs.
Such experiments are always likely to be controversial due to the purely subjective nature of the effect. It is in this spirit in which the work is undertaken, and brief commentaries are offered on the each example, but the reader is invited to form his or her own opinions of the resulting data. Good quality headphones are recommended for the keenest appraisal of results.
6.2 Results
6.2.1 Solo Violin Spot versus Violin Section Microphone
Movie 14
The spectrogram is set to a range of 100Hz-10kHz. There is an audible low frequency boost when the delay of 190 samples (4.3ms) is switched in. For this delay, a frequency of about 465Hz would experience phase reversal and so could be enhanced if it was being naturally attenuated in the original resultant signal.
B] A longer phrase
Movie 15
When applying this same delay to a longer phrase. It might be considered harder to resolve the difference incurred in this context, and so to this end it is switched in and out erratically towards the end of the excerpt. Harmonic changes can be detected. Obviously, any harmonics (which are not the fundamental) that fall at the “emphasised” frequency, will tend to be affected. This could lead to a perception of pitch change. The spectrogram does display the “induced drone” as the horizontal band just above the low frequency “columns”.
6.2.2 The Decca Tree: Left versus Right
A] Short Loop
Movie 16
The spacing on a typical Decca tree is well established, but it may be of interest to hear this excerpt (Movie 16), which focuses on a “low frequency instrument”, the timpani. The equivalent distance of this shift is about 87cm.
B] Longer phrase
Movie 17
When a longer musical passage is treated, the timpani is enhanced along with the droning harmonic sound. The spectrogram reveals this low frequency enhancement, and also an associated attenuation in the low-mid band.
6.2.3 The Decca Tree L versus C versus R
Movie 18
In Movie 18, all three elements of the Decca Tree are compared. The MAGIC algorithm estimated that the right microphone should be 114 samples relative to the left, and the centre only 13. Again it is the low frequencies which are boosted, with the timpani appearing to exhibit a longer audible release. Towards the end of the excerpt, the reader will notice that only the right microphone is delay- toggled. The effect is still prevalent, but perhaps less pronounced.
It is speculated that the said delay might be attributed to the oblique triangle formed by the pair and the selected instrument’s precise location. Repeating the exercise whilst focused on different instruments might yield different results.
6.2.4 The Ambient Pair (Outriggers)
Left and Right outriggers with an iterative single sample increment of delay.
Movie 19
Movie 19: this example illustrates how the positioning of the ambient pair of microphones (Outriggers) alter the resultant timbre on a short string loop as the delay on the left microphone is incremented by a single sample iteratively. The interference patterns display different colourations as with the L+R Pair (6.2.5), although due to the relatively high amounts of ambience, this difference is masked to some degree since the overall effect is perhaps perceived as a “change in room size/shape”; simply a different interaction of reflections. Such an approach might prove useful for ascertaining any desired delay offset after recording, which can simply be locked or used as a starting point for manual tweaks. This excerpt may also be of interest to minimalist composers!
6.2.5 Left and Right: AKG C422 Stereo microphone
A] Manual delay adjustment
Movie 20
In addition to the Decca Tree with its associated Outriggers, there was a stereo microphone- an AKG C422 (cryptically referred to hereafter as “pair”) placed centrally. Obviously interfering with the classic design of such a venerable microphone is close to sacrilege for many, however it was done in a (mischievous) scientifically curious spirit, and it might be of interest to readers to hear it being “re-designed”, as well as to just imagine that it was a physical pair (with an ideal reference positioning) capable of on-axis movements.
Movie 20: Virtual microphone placement, moving along its axis, is demonstrated by application of a delay line on the right microphone, manually controlled to sample accuracy by a slider in the Max Patch. At different values it can be noted how the timbre changes, accentuating various harmonics and performance nuances.
At a delay of 9 samples a dullness can be observed, characteristic of phase cancellation in higher audible frequencies. Around 73 to 75 samples, an enhanced tremolo can be detected. This could be due to a cancellation of about 589Hz that attenuated the principal tone, allowing the backing simply to appear more prominent. At 87 samples an enphasised pure musical tone can be detected.
The brief 9 sample delay (near the beginning of the excerpt) is equivalent to about 7cm of on-axis microphone placement, typical of what an engineer might implement manually in an actual two microphone situation. Consideration of this portion of the clip demonstrates the importance of such placements.
B] With a fixed delay
Movie 21
To focus in on one example from above, the delay of 75 samples in Movie 21 is now locked into the system to facilitate A/B comparison. The principle tone appears cancelled, allowing the tremolo backing to appear accentuated at expense of this. Such phenomena are well understood in the micotiming community, but are presented here in the context of microphone placement. The 75 sample delay is equivalent to approximately 58cm of on-axis movement, thus providing an illustration of the subjective effect of such placement. Whilst such a large adjustment might not be readily made during set-up, the same effect would occur from the relative positioning of a given instrument from the idealised focus of the stereo pair. It is notable that this can be considered after the recording has taken place. Obviously all this is hypothetical hyperbole!
6.2.6 Ambient Pair versus Stereo Pair
This short loop in Movie 22 features a Timpani hit as picked up by both the outriggers and the stereo pair. One pair is adjusted relative to the other pair. When listening to both pairs as recorded, a classic orchestral sound can be heard, which on close listening reveals a “slapback” quality induced by the time differences in the transient signal reaching each pair. It is reminiscent of a slight glissando into the note, and might be induced by the natural pitch-bend of any membranophone at the moment of impact. When an auto-detected delay of 173 samples is applied to the Ambient pair however, the transient appears to coalesce and the characteristic low frequency enhancement is observed. 173 samples equates to a fundamental between B2 and C3 (the wrong register), and so the observed effect is more likely to be attributable to the relative placement of the two pairs.
6.2.7 The Decca Tree Pair versus the Stereo Pair
A] Short Loop
Movie 23
The 2 pairs are offset against one another in Movie 23, and a degree of enhancement can be perceived when the delay is applied in the form of an F sharp from the violas (the 5th of the chord). This note was not perceived to be as loud in the ambient mix and its accentuation is a function of the signal summation. This might imply that such techniques would facilitate retrospective “re-balancing”, although due to the fixed note capture range of such purely phase-based efforts, it may not be as versatile as might be desired. With the aid of a score as utilised by Woodruff et all [10], dynamic delay methods might be explored, although it is likely that the unwanted artefacts would outweigh potential benefits.
As mentioned earlier in the background to this section, current source separation techniques are developing, but have yet to fully succeed in an ambient environment.
B] Longer Loop
Movie 24
The violin is clearly affected in the longer phrase of Movie 24.
6.2.8 Pair Left versus Tree Left
Movie 25 explores the half of the stereo image contributed by the summation of the left microphones of both the pair and the tree. An equivalent investigation could be performed on the right. With 51 samples of delay applied, a harmonic is accentuated. Should the pair have been about 39cm from the Left Tree microphone, this might have happened naturally. It is of course possible that the Centre Tree microphone might have cancelled this effect.
Movie 25
6.2.9 Tree Right versus Ambient Right
A similar phenomenon occurs between the right microphones of the outrigger pair and the Decca pair in Movie 26.
6.2.10 Tree Left versus Ambient Left versus Pair Left
Movie 27
When the left microphones of all three pairs are brought together in Movie 27, there is a perceptible high frequency change. Notice the momentary squeak on the violin, which is exaggerated with the delays applied. Again, a single tone is accentuated. The transients of the harp broken chords are enhanced.
6.2.11 All three Decca Tree Microphones with the Solo Violin spot
Movie 28
When the solo violin spot microphone is delayed manually (via the slider) relative to all three tree microphones in Movie 28, there are noticeable timbral fluctuations. Whilst the violin’s own sound remains relatively pure, it is the “backing” which experiences the more significant effect from the (virtual) placement of the spot. This manual control would allow for a much more artistic adjustment of the sound than a pure computer determined offset. The process is entirely analogous to shifting the spot microphone along its axis, just as a human engineer might do. In this instance however, that shift is possible after the recording has taken place.
6.3 Summary
Due to the large numbers of microphones placed in this session, the number of combinations which might have been compared in this exploration is enormous. Rather than being exhaustive, this text has simply attempted to display a range of artefacts induced by such manipulations. In such an extensive recording scenario such as that described in Appendix B, options were provided rather than a necessity to actually attempt to blend all available signals. It is envisaged that the technique might be applied in an explorative fashion whenever spot microphones or multiple pairs are employed in a final mix in order to inform an artistic judgment. Leonard [13] noted that the application of delay was most pronounced when spots were delayed relative to a dummy head, and indeed that would be an interesting scenario on which to test this algorithm.
Should it ever be desired to impose either the delay suggested by MAGIC or indeed selected manually, the software facilitates a simple ‘export” function which prepends the audio with the exact number of samples required to elicit the preferred effect and henceforth continue working in the user’s preferred DAW. The reader might be interested to hear Audio 1; a stereo mix of the same excerpt that was constructed by Pip and Hayden from a selection of the numerous microphones (see Appendix B) beyond just those considered here. NB For technical reasons here, the end fade of this excerpt is slightly truncated and does not accurately represent the reverb tail.
7 Conclusions
The algorithm and associated processes have been shown to exert a noticeable effect on various forms of audio. The results are of course subjective. It is clear that in using purely time domain manipulations of this fashion, phase artefacts will be inevitably introduced at the expense of any transient (or more general amplitude) “correction”. The question is whether that is preferable or even acceptable. Much technology is developing to separate transients and steady states, but this is less analogous to the simple microphone placement in whose spirit this work developed.
Leonard [13] found that 14 subjects (a mixture of sound engineers and musicians) all preferred delayed spot microphones, despite using a method of calculating delays which did not take account of humidity variations in a given session, and applying a delay resolution of only 0.1ms. Section 3.1.1 above discusses the need for sample accuracy; the MAGIC system implements this. In more recent times it has been said that humidity (a function of human exertion in performance and possible audience presence) can vary in as little as 15 seconds to the point of affecting accurate capture of Impulse Responses, due to the variation in high frequency absorption. This variation might conceivably affect the perceived speed of sound used to calculate the delays. The algorithm presented here is also susceptible to such variations, but strategies to maintain timbral consistency could be sought when recording longer pieces such as a whole symphony, since temperatures might rise by several degrees over the duration of a performance. One such strategy might be as simple as inserting inaudible edit cuts between movements of a symphony and applying MAGIC separately for each. Such efforts might form a separate discussion.
It may be that the delayed versions presented here are considered notable in some excerpts, either favourably or otherwise, but the reader must remember that the amplitude optimisation offered by the MAGIC algorithm is itself subjective and highly dependant on the greater energy content of the fundamental frequency of lower pitches at any given moment, and so sonic artefacts apparently induced will themselves vary in longer passages and their impact subsequently diluted. If the system was implemented with a high pass filter in the MAGIC detection path, it might be possible to obtain interesting results beyond the predominant low frequency boosting.
The Multi-Microphone examples feature all signals at unity gain. It is envisaged that in any real-world situation, more subtle balances would be sought.
Leonard [13] found favourable perception when applying artificial ambience to spot signals to maintain the psychoacoustic perception of distance relative to any principal stereo pair. With current technology, it is suggested that creating such an ambience with a convolution reverb unit might yield the most authentic results. This impulse response should of course be created preemptively with the orchestra in place to account for the human absorption factor, although a typical high energy sine-sweep would of course induce some fascinating resonant responses from the individual instruments, probably to great the consternation of the players.
In cases where the amplitude is increased, the SNR will be similarly increased since due to the very nature of noise, it cannot display constructive interference in response to a temporal shift. This is an obvious benefit of the approach described here. It may be that the prevalent low frequency enhancement is not desirable in the musical content, but subtractive EQ might temper this whilst still retaining some SNR benefit. It is also notable that whilst a single sample of delay can be audible in certain conditions, there appeared to be a “capture-range” within which the time-correction would exhibit a degree of the desired effect. It may be this which led Anazawa and Takahashi [12] to their 1ms requirement referred to in Section 3.1.1.
An interesting sub-text is the formation of an environment for easily applying minute delays to the various components of a multi-microphone recording. It is this ability to manually interact with the reference micro-timing of multiple sources that may interest some readers. It allows not only overrides of the machine’s estimation according to taste, but also the convenience of a sound-shaping tool based on phase alone.
Such intervention is certainly not new and can easily be applied in most modern DAWs, but arguably this software does draw one to consider such action and offers a user-friendly and tactile approach to investigating the effect of multiple delays both between single signals and stereo pairs. Although not discussed in the text, this tactility is enhanced further through the implementation of a hardware control surface to operate individual faders.
Readers might consider investigating their preferred recording scenarios similarly.
Ultimately, whichever of the above approaches is taken, the effect is only equivalent to the art of the sound engineer, who will make minuscule adjustments to the positioning of microphones in a session, evaluating the monitor mix to choose final placements. The author insists that this art must remain paramount to any recording and will ultimately offer the most aesthetically pleasing results.
Whilst axis related polar responses are not considered in this software or text, on-axis “distance related” positioning becomes readily adjustable, but crucially after initial capture. The system cannot however represent this wholly accurately since the ambient reflections are being time-shifted in accordance with the direct signal, when in fact their real world time delay would be a trigonometric function of their acoustic path from source to microphone.
Whilst a good conductor with a good orchestra will hone the performance balance of the music, there will always be situations where the producer might wish to intervene retrospectively. Such situations might typically arise from the rendition of non-standard repertoire, often when time is pressured such as in soundtrack recording, and it is these that might just benefit from the approaches described above. Such methodology might even be of use when setting up microphones before actual recording, simply to inform the actual physical placements, subsequently disregarding the software.
Discussion of aesthetic quality has been avoided wherever possible; the commentaries provided with each example are simply to prompt readers to appraise the audio themselves, formulate individual opinions and encourage debate. Having said this, the “studio” recordings appeared more favourably responsive to the time shifts than the ambient orchestral ones. This is entirely predictable since by definition, the ambient ones have less defined waveforms for the MAGIC to work on. In particular, the bass and drum-kit might be singled out as aesthetically pleasing. The bass displayed a low frequency boost, and the drums notably displayed enhanced transients.
Eminent engineer, Gregg Jackman bases his whole recording practice around transients. He told the author:
Transient response is massively more important than frequency response. This is why people like recording to tape. It limits transients. Different speakers and microphones respond to transients differently- their frequency response curves all look about the same.
8 The Last Word
This body of work offers rather new approaches based on well-understood technology and the most simple of ideas. The bottom line is whether the producer, engineer or artist prefers the “corrected” or adjusted sound. Grammy award winning producer Pip Williams told the author:
Sometimes the client prefers it out of phase. Both “Status Quo” [in the past] and “The Moody Blues” have preferred the snare drum out of phase!
ACKNOWLEDGEMENTS
Thanks to Pip Williams for producing, Gregg Jackman for recording and Matthew Letley for performing the drum excerpt, taken from Status Quo’s “Pennsylvania Blues Tonight”, from the album “In Search of the Fourth Chord”. Thanks also to Pip and Gregg for their views on this work and its background. Even further thanks are due to Pip for the use of the Nightwish orchestral sessions and the session data.
Thanks also to John Edwards for performing the bass excerpt and Paul Borg for recording it; taken from “The Making of Quiet Things” by “The Number”, produced by the author. Thanks to Drew Downing for the guitar performance and recording of that excerpt, Sebastian Lexer for the small piece of code which the “Write to HD” section of the patch is based on, and Stephen Frost for sharing his experience of the great concert hall recordings.
REFERENCES
[1] J. Paterson, “Set Phasors to Stun: an algorithm to improve phase coherence on transients in multi microphone recordings”, ICA2007 Madrid, Proceedings, 2-7 Sept. 2007
[2] Bartlett, Bruce, “A Scientific Explanation of Phasing (Flanging)”, J. Audio Eng. Soc., 18, 6 pp. 674, 675, 1970
[3] C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 24, no. 4, pp. 320-327, Aug. 1976.
[4] B. Champagne, S. Bedard, and A. Stephenne, “Performance of time-delay estimation in the presence of room reverberation,” IEEE Trans. Speech Audio Processing, vol. 4, no. 2, pp. 148- 152, Mar. 1996.
[5] J. Benesty, “Adaptive eigenvalue decomposition algorithm for passive acoustic source localization,” Journal Acoust. Soc. of America, vol. 107, no. 1, pp. 384-391, Jan. 2000.
[6] Choi, Seung Jong; Jung, Yang-Won; Kang, Hong-Goo; Kim, Hyo Jin, “Adaptive Microphone Array with Self-Delay Estimator” AES 29th International Conference, Sept. 2006, pp. 2-4
[7] C. Duxbury, M. Davies, M. Sandler, “Separation of Transient Information in Musical Audio using multiresolution Analysis Techniques” Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01), Limerick, Ireland, Dec. 6-8, 2001
[8] J. Paterson, “Killing Spillage” Proceedings of the Art of Record Production Conference, Edinburgh, Scotland, 8-10 Sept. 2006
[9] J.V. Stone. Independent Component Analysis: A Tutorial Introduction, Cambridge, Mass.: MIT Press, 2004.
[10] J. Woodruff, B. Pardo and R. Dannenberg. “Remixing Stereo Music with Score-Informed Source Separation”, ISMIR 2006, 7th International Conference on Music Information Retrieval, Victoria, Canada, 8 – 12 October 2006
[11] H. Viste and G. Evangelista. “A Method for Separation of Overlapping Partials Based on Similarity of Temporal Envelopes in Multi-Channel Mixtures”, IEEE Trans. on Audio, Speech and Language Proc., in press.
[12] T. Anazawa and Y. Takahashi, “Digital Time Coherent Recording Technique,” Audio Engineering Society Preprint, 2493 (H-2), October 1987, 3-4.
[13] T. Leonard, “Time Delay Compensation of Distributed Multiple Microphones in Recording: An Experimental Evaluation” Presented at the 95th Convention of the AES, New York, Preprint 3710, 1993
[14] Elgar/Vaughan Williams: String Orchestra Works, (2000). [Audio CD]. Great Recordings of the Century (EMI Classics) ASIN: B00003ZKRL. Taken from sleeve notes.
Appendix A
Nightwish- Orchestra information
The orchestra comprised the top session players available, including members of The London Session Orchestra and members of leading symphony orchestras (All regular film score players).
The orchestra adopted the name “The Dark Passion Play Orchestra” for this project, after the album name “Dark Passion Play”.
Songs were mainly composed by Tuomas Holopainen.
The choir were Metro Voices (who featured on the special Honda TV commercial)
Recorded at Abbey Road Studio One, Dec 18th- Dec 22nd 2006
(1st two days were orchestra and choir recordings):
Feb 22nd- Feb 24th 2007 (all for orchestra and choir)
So, 5 days in total for orchestra and choirs, rest for overdubs.
Orchestra Leaders:
Gavyn Wright (Dec 06)
Perry Montague-Mason (Feb ‘07)
Choirmaster:
Jenny O’Grady
Conductor:
James Shearman
Orchestral Contractor:
Isobel Griffiths, assisted by Leila Stacy
Music Preparation:
Richard Ihnatowicz
Recording Engineer:
Haydn Bendall
Pro Tools Engineers and assistants:
Sam Okell, Richard Lancaster (with Andrew Kitchen, Robin Baynton)
Orchestra Arranged, Orchestrated and Directed by:
Pip Williams
Orchestral Lineup:
1 x Flute (doubling Piccolo, Alto Flute, Whistle)
1 x Flute (doubling Alto Flute)
1 x Oboe (doubling Cor Anglais)
2 x Clarinets (doubling Bass Clarinets)
1 x Bassoon (doubling Contra Bassoon)
4 x French Horns in F
4 x Trumpets (doubling Flugelhorns)
2 x Tenor Trombones
1 x Bass Trombone
1 x Tuba
1 x Percussion
1 x Percussion (doubling Tuned Perc)
1 x Timpani
1 x Harp
12 x 1st Violins
10 x 2nd Violins
10 x Violas
8 x Celli
4 x Basses
This was a total of 66 players.
On a couple of songs, this was very slightly reduced.
Main Choir:
8 x Sopranos
6 x Mezzo Sopranos
6 x Altos
6 x Tenors
4 x Baritones
2 x Basses
Gospel Choir:
3 x Sopranos
3 x Altos
3 x Tenors
3 x Baritones
(Gospel choir was double tracked
Appendix B
Microphones used
Main Room:
Decca Tree above conductor’s head: 3 x Neumann M50 (each set to Omni polar pattern), plus AKG C422 Stereo mic
Room outriggers (ambience) Placed facing the whole orchestra, wide and about 4 metres high: 2 x Neumann M50 (Omni)
Orchestra:
1st & 2nd Violins
Each section utilised 2 X Neumann U87 (Cardioid), placed along each line of players (12 x 1sts and 10 x 2nds). Mics were approx 2.5 metres high. Placed apart so as to cover the whole section.
Violas
2 x Neumann TLM 170i (Cardioid) placed as above (10 players). Again, approx 2.5 metres high. Placed apart so as to cover the whole section.
Celli
1 x Neumann U47 (FET type, on cardioid), aimed at the section of 8 players approx 1.75 metres high. Placed so as to cover the whole section.
An additional FET U47 was aimed at the 1st cello, so as to cover the various solo parts.
Basses
2 x Valve Neumann U47 (Cardioid) placed one between each pair of players (4 players in all). Just below the bass bridge, 2ft in front.
Harp
Was situated roughly between and at the back of the 1st & 2nd violins.
A single DPA (B&K) 4011 was placed approx 0.75 metres away, aimed at the instrument sound board.
Woodwinds
Front row- 2 x Flutes/Alto Flutes/Piccolo etc, 1 x Oboe/Cor Anglais
Back row- 2 x Clarinets/Bass Clarinet. 1 x Bassoon/Contra Bassoon.
4 x Neumann KM86i (Cardioid). Placed slightly above and approx 1.25 metres away from each instrument group.
French Horns
4 players. 1 x Neumann KM86i (Cardioid) Behind section. 2 metres.
Brass
4 x Trumpets, 2 x Tenor Trombones, 1 x Bass Trombone, 1 x Tuba.
STC/BBC Ribbon mics.
Between 2 and 4 were used, depending on song. Tuba was reinforced with a single Neumann U87 approx 1 metre away.
Timpani
Single Neumann KM84i, approx 2 metres above and overlooking all drums.
Percussion
4 x DPA (B&K) 4011 were deployed and moved over all tuned and other percussion, as required, approx 1.75- 2 metres above and just in front of each instrument.
Choir
Ladies (Sopranos, Mezzo Sopranos and Altos) A total of 6 x Neumann KM86i’s on Cardioid. 1.5-2 metres in front of and 2.5 metres from ground. Spread along sections.
Men (Tenors, Baritones, Basses). 3 x Neumann U87’s. Placed similarly.
In addition, 2 x DPA (B&K) 4011’s were placed approx 3-4 metres in front, for ambience.
Gospel Choir
4 x Neumann U87, one for each section. Approx 1 metre in front.
Soloists
Uilleann Pipes. 2 x Neumann U87 in XY configuration.
Celtic Fiddle (NOT in The Islander!) DPA (B&K) 4011. i metre above.
Boy Sopranos 1 x Neumann U87
Cymbalom 2 x DPA (B&K) 4011. 1 metre above, Near Coincident Pair.
Solo Violin on “The Islander”.
A spot mic (U87) was employed but may not have been used finally!