There's a lot of confusion about what sample rate to use for YouTube.
YouTube's own guidelines for music videos
Bit depth: 24 bit is recommended, 16 bit accepted
Sample rate: 44.1 kHz is recommended, higher rates accepted
Channels: Stereo
Format: PCM (WAV or AIFF)
If you insist on delivering compressed audio the guidelines state:
Codec: AAC-LC
Sample rate: 44.1Khz
Bit rate: 320kbps or higher, 256 kbps accepted
Channels: Stereo
(source:
Encoding specifications for music videos - YouTube Help)
YouTube's own guidelines for other or general material
Codec: AAC-LC
Channels: Stereo or Stereo + 5.1
Sample rate: 96khz or 48khz
Other formats are clearly accepted even though it's not stated, but the above is the official recommendation. Apparently YouTube doesn't want uncompressed audio here, which means transcoding is bound to happen. D'oh!
Other tests I've done don't show any penalty to the video quality if you upload full quality PCM audio instead of the recommended AAC, so there you go. I did this test to see if the total size of the video (i.e. the audio contribution in terms of file size) would compromise the video quality.
(source:
Recommended upload encoding settings - YouTube Help)
Test
Obviously there's some potentially conflicting information so I decided to test what actually sounds or measures best, and what happens at different source sample rates.
Test files
· 24 bit PCM WAV
· 20 Hz to 20 kHz slow sine sweep generated at the actual target sample rate
· Identical sample peak, true peak and integrated loudness, headroom of 6 dB
· Accompanying video is 1080p HQ with no lossy compression of audio.
Why a simple sine sweep? It covers the full frequency spectrum and it's easy to detect distortion or aberations. Actual music might mask problems in the conversion file (which could be a good thing in real life, though).
Maybe I'll do a square wave test as well to see what happens with inter-sample peaks and how this could also affect the loudness reading slightly in a negative fashion.
Captured audio
· Converted audio from YouTube was captured digitally
· Converted audio was sharply low pass filtered @ 15 kHz by YouTube
· Converted audio was lowered in level by YouTube to match YouTube’s loudness normalization target
System playback @ 44.1 kHz
Results
Uploaded source: 44.1 kHz
Aliasing artifacts: None
Other noise: Very low noise around the AAC noisefloor @ 20-500 Hz
True Peak level: -14.93 dBTP
Sample peak level: -15.01 dBFS
Max. momentary loudness: -13.1 LUFS (I) (loudest in test)
Total RMS level: -16.52 dB (loudest in test)
Uploaded source: 48 kHz
Aliasing artifacts: Yes
Other noise: No
True Peak level: -15.63 dBTP (lowest true peak in test)
Sample peak level: -15.69 dBFS (lowest sample peak in test)
Max. momentary loudness: -13.2 LUFS (I)
Total RMS level: -16.53 dB
Uploaded source: 96 kHz
Aliasing artifacts: None
Other noise: No
True Peak level: -14.87 dBTP (highest true peak in test)
Sample peak level: -14.93 dBFS (highest sample peak in test)
Max. momentary loudness: -13.2 LUFS (I)
Total RMS level: -16.53 dB
Conclusion
It's hard to say what will sound worse on YouTube, but the 48 kHz conversion is the only one that has aliasing artifacts, which is a big no-no in my book. The 48 kHz version also features the lowest peak values after conversion for some reason.
Distortion is almost similar between the 44.1 and 96 kHz files, but while the 48 kHz file seems to have fewer problems in the low mids it has distortion in more areas in the high end.
On the other hand the 44.1 kHz conversion was the only one to feature some measurable noise around the AAC noisefloor, approximately at -100 dBFS, which I would expect to either be in all or none of the conversions.
For now I think it's safe to conclude at if you've mastered a track with a target rate of 44.1 kHz for digital aggregation or CD there's no point in doing a specific 48 kHz version for YouTube, it could even be detrimental to the sound.
However, a 96 kHz version shouldn't be bad if you're already working at that sample rate, but upsampling later won't help.
Disclaimer and other practical considerations
Here's the catch:
Many or most video editors by default work in 48 kHz sessions and will import a 44.1 kHz master into that session, automatically converting it to 48 kHz using what ever crap SRC is built into the video software. It's lazy, but that's the way it is.
So ironically you might be better off doing your own high quality 48 kHz SRC using e.g. Izotope RX7 for the video guy/gal even though it's likely worse for YouTube in the end. A better solution would be to educate the video editor and label, if you have the guts, but then there's VEVO...
VEVO is apparently different. Specs here are 44.1 kHz for AAC, but 24 or 16 bit 48 kHz for PCM, with Little Endian byte order during video export. So there's no way around a 48 kHz version for VEVO if the specs are to be believed, i.e. the video might be rejected on upload if it's PCM and not 48 kHz.
Pics
![]()
44.1 kHz sine sweep after conversion
![]()
48 kHz sine sweep after conversion, notice the aliasing lines in the background not present in the other conversions
![]()
96 kHz sine sweep after conversion
Download or open in new tabs to blow up details.