A couple weeks ago, I talked to you about how to change the size of your uncompressed .wav files. There was some mathy-math in there, and one of the pieces of that puzzle that I kind of glossed over was this idea of the sample rate.
Do you know what sample rate you record at? For the audio I work with, it’s usually either 44.1kHz or 48kHz. There’s not much practical difference between the two, except that if you create a project using a mix of tracks recorded at different sampling rates, you’ll need to conform one to the other before you start. In any case, I generally record at 44.1kHz, and most of the other people I work with do too.
But where does 44.1kHz come from?
The simple version is that it’s the sample rate that was used as standard on compact discs. 44.1kHz was later used as a standard for .mp3 format audio files, mainly because they were often ripped from CDs.
The history of why CDs were sampled at 44.1kHz is itself worth noting…And it has to do with math. Again. Sorry.
Broadly speaking, humans can hear frequencies between 20 and 20,000Hz (or 20kHz). Which by the way is an excellent podcast, if you haven’t already heard it. Tell Dallas Taylor I sent you. Anyway…frequencies higher than 20kHz are what we call ultra-sound, and frequencies lower than 20Hz are called infra-sound. If you want to accurately reproduce an analog signal with frequencies up to 20kHz, then the Nyquist-Shannon sampling theorem says that you need to sample that signal at more than double the maximum frequency – or at least 40kHz.
I won’t go into all the details behind the Nyquist-Shannon sampling theorem, mostly because I’m a little rusty on my calculus. But here’s a little tiny primer. One of the tools we use a lot in audio engineering is my friend, the Fourier Transform. Nearly every function you might use in post-processing makes use of it…It’s basically what takes a waveform and extracts the frequencies that make up that wave. It’s used in the spectral display, in EQ, in low- and high-pass filtering, just to name a few. You can take a signal, transform it into the frequency domain, perform operations on it, and then invert it back into the time domain. But if you don’t sample a signal often enough in the time domain, when you go to do the inversion process, you can get what’s called aliasing…in other words, there’s not enough information to resolve the original signal at high frequencies, and you might come out with something that’s not quite the original. Hence…if you want good resolution up to 20kHz, the limit of human hearing, you need to sample at 40kHz or higher.
There are some other reasons that 44.1kHz stuck, including the fact that 44,100 is the product of the squares of the first four prime numbers, and that the extra 4.1kHz means that you’ve got a little wiggle room when you’re low-pass filtering to 20kHz. But mostly, that particular value is used because that’s what Sony used, when they first started producing digital audio on cassettes back in the late 1970s. It kept going into the CD era (hands up who had a Sony CD player), and it continues to this day. It’s got stiff competition from 48kHz though, which has gained in popularity because of its relative ease of use with digital video equipment. But that’s a topic for another day, I think…