It seems that the idea to record, store and process audio at very high sample-rates has become fashionable. More precisely, sample-rates far beyond those suggested by the Kotelnikov/Nyquist/Shannon theorem. Over the last weeks, I ran into several debates and disputes related to the matter and thought it would be wise to highlight several of my concerns toward this trend in proper, unconvoluted form. With this in mind, let's restrict the discussion to a pure processing context.
This both includes high rate formats such as 192kHz PCM and "DXD" (i.e. 384kHz PCM) having the aim to preserve the original recording, as well the concept of manual, "DIY" resampling for processing purposes.
The central question here is:
Is it really beneficial to process at higher rates, and if yes, how much higher? ..and when does it stop?
Some better known facts:
- It is no secret that the concept of oversampling can prevent or at least reduce the amount of aliasing generated by nonlinear-processes.
- Similar to the above, it is a fact that most AD/DA converters tend to sound better at higher rates, mainly because higher rates allow them to use less aggressive filtering.
These two arguments are perfectly reasonable. With this in mind, the most intuitive reaction is a generalization close to "The more, the better". The experiment I am about to describe can be seen as a counter-weight to these points. Here's the idea, let's imagine two guys recording the following analogue signal:
It consists of two sine waves, one at 10kHz, the other at 24kHz. So far so good, high sample-rate AD's can handle the task. The "high-end" guy captures a truly high fidelity replicate of the original.
Alternatively, a "lofi" guy refuses to use any rate higher than 44.1kHz and records this:
As we can easily see, the samplerate of 44.1kHz fails at capturing the 24kHz sine wave. However, the overwhelming majority of tests and papers being published on the subject clearly show that the human hearing system is not able to detect a 24kHz sine wave (in a linear environment. More about this below). As such, it is safe to expect that the difference between both is inaudible.
Ok, both guys now decide to "master" their great recordings with a limiter at the end of the chain. This limiter is a relatively aggressive nonlinear device, so it will potentially introduce both harmonic distortion, and it's vicious brother, inter-modulation-distortion.
The "lofi" guy does just this, and ends up with the following:
Fine, we see some odd ordered harmonics appearing in the signal.
The high end guy giggles and fires up his high rate processing-chain and ends up with this:
Not good. We see odd ordered harmonics, but also a disturbing amount of so called inter-modulation-distortion (IMD:
Intermodulation - Wikipedia, the free encyclopedia).
Again, in both cases, sources sounded exactly the same. But the results differ badly. See the new peak at 4kHz? It will land in its own critical band (no masking) and become annoyingly audible.
Now, here's my question, who ends up with the most fidel representation of the original?
A higher bandwidth doesn't equal "better". In fact, when it comes to nonlinear processing, it's better to use the narrowest tolerable input bandwidth (given properly anti-aliased processors).
This is a simplified experiment expecting the plugin to control all thing related to aliasing internally. My point here is the
relation between processing bandwidth and IMD. The experiments clearly demonstrates the practical limits of high-rate processing (i.e. "more is not better"). In particular for chains of processors running at a very high rate.
Additionally, let me highlight another point. Many plugins use internal resampling, which in turn seriously hits the CPU performance. The idea to increase the sample rate before processing nonlinear stuff and reduce it afterwards (while saving a lot of CPU power) is tempting.
However, there is a problem. As we've seen above, it is very important to restrict the bandwidth before any form nonlinear-processing. This asks for some form of bandwidth limitation, typically right above the audible range. This leads to several contradictions:
- What's the point of using super high rates (and bandwidths) if every nonlinear processor has to restrict the bandwidth anyway?
- The user needs expert knowledge to properly do this filtering at the right places.
- Linear processes such as EQing, delay, reverb and similar do not extend the bandwidth and thus, don't suffer from aliasing effects. In cases of feedback based algos, higher-rates even directly result in a lower precision. Running these at high rates is clearly not beneficial (given a high quality processor).
- Non-technical users typically have no idea which processes really benefit from a higher rate.
This brings me to the conclusion that it is more reasonable to use standard rates and let the plugin/developer hold his promises, clean up his own mess afterwards, and gives his best to do it in the most efficient manner. More reasonable than DIY oversampling.
Experiment setup: I used a saturator to simulate the effect of a "limiter". I tried to use a popular limiter, but it turned out to create so much IMD on its own that I preferred using a more "visually appealing" type of nonlinearity (in this case a rather simple saturator). Here's the test setup:
![]()
And here's the Reaper project file (beware, it's set to a samplerate of 384kHz. In doubt, use Reaper's "Dummy audio" driver)
http://www.tokyodawn.net/labs/public/testsetup.RPP