Comparison of Audio Codecs and Conversion Utilities on Linux
If you like music as much as me you may find yourself taking time pondering to yourself what would be the best way of creating lossy backups of your CD collection?. This article will discuss standard audio codecs and briefly cover common conversion tools available to *NIX Operating Systems. This article is written for Debian & Ubuntu, Fedora, and ArchLinux.
- Some basic understanding of audio codecs
e.g. Understanding terminology such as bitrate
Don't panic, I'll introduce concepts of lossless, lossy encodings and different encoding methods (VBR, CBR, ABR) for those of who are new to, or have forgotten, these concepts in this article.
- A working *NIX machine
I'll be writing these examples on my box (ArchLinux), but indicate usage for other distros and POISX compliant OS's. So most of these examples should work on other OS's (e.g. BSD, MAC OS X (Darwin)).
- Some utility to rip cds, k3b, asunder, and a console utility that I am unable to recall at the moment. Leave suggestions!
I'll use asunder.
3. Audio Encoding Terminology
Let's discuss some basic terminology concerning audio encoding for the Casual End User from the perspective of a Computer Enthusiast (whose life does not completely revolve around audio).
3.1. Lossless vs Lossy
A quick define:$term on Google shows the following:
Lossless: Of or relating to data compression without loss of information
Lossy: Of or relating to data compression in which unnecessary information is discarded
So let me explain. When you have a file, any file, that file at its original form is uncompressed, in its original form. When you compress it with an application such as bzip2 or 7z or gzip (you get the idea) that data is compressed using a complex algorithm that makes the file smaller. Compressing audio is very convenient, especially when transferring larges files over the Internet.
Compression ratios can vary. Some archives can be as low as 1% the size of the original file (or in some cases even negative) or as high as 99% compression ratio achieved over the original content depending on the algorithm, its parameters (e.g. dictionary size, memory allocation) and the file content.
A rule of thumb is text content is easiest to compress when considering audio, video, and binary content. But these file types can also be compressed. For example, YouTube content is compressed using MPEG-10 standard, and binary files are compressed with an application such as UPX (the Ultimate Packer for eXecutables). The Linux kernel itself is also regularly compressed on Live CDs and USB distros.
Audio has traditionally been delivered in a lossless PCM format derived from compact disks. Now-a-days vendors compress this data into lossy content to 1/10th the size and data bandwidth of the traditional CD (which is one reason, why I try and avoid digital purchases for albums).
This article will assume you still purchase cds or are willing to forgo the potential issues with transcoding lossy to lossy content. I would NOT recommend doing the later.
So you must be asking yourself how does compression like zip or h264 apply to audio? Well if you have ever ripped a CD to your computer on *NIX using a tool like cdda2wav you get a file $file.wav. That file is on average 50megabytes. Youtube videos are on average 3-8megabytes and sound pretty good. So you might be saying" whoa yea!". Back in the day disk space was scarce and bandwidth limited, so people developed compression algorithms to create a more efficient way of transferring data on the Internet (including audio).
So when discussing compressed audio there are two forms of compression (discussed earlier) Lossless, and Lossy. Whats the difference? Well from the definitions (look above) you may already have an idea. Audio compression can be compared to compression of images. You have lossless formats: PNG, TIFF, etc.. and Lossy formats: jpg, (eh... idk what else?). The lossless images retain all the content of the original file while the lossy files do not. The end user usually never sees the difference. Its magic! You can't really apply lossy compression to actual text files since the lossy codec using some complex algorithm to actually remove "unnecessary" content. (and what content could be considered unnecessary in a text file?) You can see how its used in photography by testing it out yourself. Open up GIMP and download an SVG (a vector file) and save it as a PNG, and a JPG. Save the jpg using 95%, and 35% quality. You should notice a difference at both quality and size. The quality for "lossy" content is perceived by the end user! You save space with lossy content but the trade off is the quality. This concept applies to audio content as well.
Audio files can be lossless (like pngs, or gzip files) or lossy (jpgs) Lossless files are 100% similar to the original uncompressed content while Lossy files are NOT. Examples of lossless (compressed) audio formats include: flac, wavpack, ape (WAV files are technically not compressed but a separate audio format that CD PCM audio is encoded to). Examples of lossy audio formats include: Vorbis, Wavpack (Hybrid), Musepack. Let's continue on to discuss some specifics about lossy formats.
3.2. Bitrate Encoding Techniques
When it comes to media files they usually have a pre-defined bitrate which sets the amount of data that specific file streams at a given moment during playback. For example a DVD can stream about 9.5Mbits of compressed MPEG2 audio, that is its bitrate. YouTube videos streams vary but I'll guesstimate the HD( 720p) videos stream at about 2Mbits per second. So each second you watch a video on youtube it streams 2Mbits of content per second every second you watch the video. This is then decoded by your player and viewed by you, the user. Audio works the same way. Both audio and video have several ways of encoding data by utilizing varying levels of bitrates to optimally store data.
There are three main ways of encoding a lossy stream of audio (although these descriptions can also apply to video encoding). These methods include Variable Bit Rate (VBR), Constant Bit Rate (CBR), and Average Bit Rate (ABR). VBR basically uses more bits on parts of the song that require them and less on those that do not. CBR usages a constant defined bit rate that is constant throughout the song. ABR (a variation of VBR) will try and aim for an average predefined bitrate, e.g. 96kbs/sec, and fluctuate as needed but not to the extent as much as VBR. Generally VBR is considered to provide the highest quality, and CBR the lowest quality relative to VBR. ABR falls in between.
VBR: Variable Bit Rate
ABR: Average Bit Rate
CBR: Constant Bit Rate
The last thing I want to talk about is quality. How do you define quality? Well in the end its up to the user. Lossless content is always going to be superior to its lossy counterpart, so most of the comparison happens between different lossy codecs. Which is the best? Well it really just depends what sounds better. What I would would recommend doing is, what is known as to the audio community as a, listening tests or a abx test. Basically you play three files $source_lossless vs $lossy_format_one. and $source_lossless vs $lossy_format_two. As the end user you determine which lossy format sounds better, or has the better perceived quality.
In many cases the lossy file and the lossless file may sound 100% similiar. Which is good, the lossy codec is doing its job. The data must be going somewhere right? Well yea but you’re (our) puny human ears really can't tell a difference. If you want to see what is removed when transcoding lossy content check out Sonic Visualiser. Also if you want to try performing abx tests, check out LinABX. To play it safe, I usually encode files at higher bitrates, although admittedly I can not notice a difference. I usually re-encode the files from a lossless version of my library as needed for DAP devices. I recommending testing each codec out personally on a test sound from multiple genres and seeing how the lossy files sound to you.
4. Choosing your codecs
Depending on what you want to do, what distro you run, and what codec you prefer, what you actually need may differ depending on usage. I mentioned Flac, Wavpack, Vorbis (ogg), Musepack codecs. So you may be wondering: Do I really need all these applications or will I ever use all these applications? The answer I give you is: Well 1. They don’t take up much space. and ... 2. Its nice to test multiple open formats.
It really comes down to personal preference. I'll provide you the links for the lossless and lossy formats below (all Open Source & Free (as in free beer)) for further personal research. Take a look and decide what you want to use. Personally, I use flac as my lossless codec of choice because its slightly faster and produces smaller files then wavpack (depending on the switches you use) and because its more widely supported on DAP devices. For lossy encoding, I use vorbis because its quality & size is comparable to that of aac (mp4) which is considered the industry standard when it comes to audio compression.
You must be wondering where does the comparison aspect of this article come into place? Well let me compare them here. In the area of lossless encoders FLAC provides (overall) faster and better compression. FLAC is the most widely supported support source encoder, check flac's homepage for a list of devices. Wavpack offers better compression then FLAC at the sacrifice of decompression speed. Meaning at runtime wavpack can be more computationally intensive then FLAC. That being said Wavpack also offers many similar features to FLAC. One feature unique to wavpack is its ability to create hybrid lossless files. These files are retain the functionality of their lossless counterparts but discard the least important bits of the audio to a specified bitrate. Whats pretty cool about this feature is that these files are significantly smaller then true "lossless" files and wavpack also allows you to store those discarded bits mentioned earlier into a separate file. This file can then be used to recreate the original lossless file. Pretty cool if I do say so myself!. In the area of open source lossy codecs, Vorbis pretty means is pack leader. It offers superior compression and quality to the various mp3 implementations and comparable quality to aac (mp3's successor) The other codec I discussed musepack it features high quality and very fast decompression but lacks support for more than 2 channels (like the other codecs discussed) and encoding at frequencies beyond 48Khz (fully supported by Vorbis). But keeping in mind most peoples setup's are stereo 48Khz it should still be a competitive encoder.
|Wavpack||http://www.wavpack.com/||Lossless & Hyrbid Lossy format|
|1: I did not cover Apple's (recently opened) lossless codec ALAC. Perhaps at a later date at the moment no open source encoders exist written to that standard. However playback for this codec and encoders (based on reversed engineered code) already do exist. check out mplayer or vlc for playback or alac encoder for an ALAC encoder|
|2: This is an open source codec that has been in development for many years and may be considered antique to some. I beg to differ. Its a pretty interesting project with active development. I encourage you to check it out.|
The following will discuss how to install the binaries needed to encode in the various format's discussed earlier.
I'll install all these utilities in bulk: On ArchLinux I would run:
# pacman -S flac wavpack vorbis-tools musepack-tools
On Debian (& Ubunutu and their derivatives) I would run:
# aptitude install flac wavpack vorbis-tools musepack-tools
On Fedora I would run:
# yum install flac wavpack vorbis-tools musepack-tools
For those of you would are the over-achivers there is actually a branch of the vorbis project known as aoTuv. This build improves quality at lower bitrates (optimal for streaming). The current version is 6.03 released 2011-04-25. If you are interested in installing aoTuv follow these steps:
$ wget https://aur.archlinux.org/packages/li/libvorbis-aotuv/PKGBUILD && makepkg -s # sudo pacman -U libvorbis-aotuv-b6.03-1-i686.pkg.tar.xz
Pacman and other package managers may "complain" about conflict with existing package libvorbis. Its ok to uninstall libvorbis since aotuv is fully compatible with programs that require libvorbis. Upon further review: It seems on Fedora and Debian you would manually have to install aoTuv... (Time to make the switch today)! All jokes aside the process for those who are still interested in installing from source run the following code in the extracted directory of libvorbis-aotuv. I would recommend against this unless you know what you’re doing. Installing an application from source may break compatibility with existing packages (libvorbis). The main branch is just fine the purposes of this article and personal usage.
# ./configure # make # make install
Run these commands after extracting and navigated the contents of libvorbius-aotuv archiv (tar xjf $archive.tar.b2)
So now you have some converters installed, let me show you some usage examples. Afterwords, I'll show you an application to automate this process. I recommend reading the man pages for further explanation.
$ man oggenc
oggenc by defaults uses vbr encoding by default. You can set bitrate_average, bitrate_hard_min, and bitrate_hard_max equal you can create vorbis files encoding uses CBR. If you dont set bitrate_hard_(min|max) equal and set bitrate_average you can produce vorbis files encoding using ABR. Generally most users use VBR because of the quality gains that VBR has over both CBR and ABR. CBR is usually used for streaming. and find out how to use them. The following are several examples with options explained.
$ oggenc $uncompressed.wav -q 8 $compressed.ogg
I specified the -q 8 option to ensure the encoder converts the audio to 256kbs bitrate. Check out the man page for additional details. Encoding at level 8 will be nearly transparent quality 99.99% of the time. However, I would highly recommend doing your own testing. Many people are satisfied with the perceived quality they are able to achieve at lower bitrates. This is beneficial when using DAP devices because of the space saved at the sacrifice of lower bitrates.
$ flac -8 $uncompressed.wav
flac will compress a wav file at its highest level level of compression and create an output file with the same name.
$ flac -d $compressed.flac
flac will decompress a compressed flac file
wavpack -b384x3 -c -m $uncompress.wav $compressed.wav
Creates a hybrid wavpack file encoded at 384Kbs. Additionally the application will store an MD5 to verify lossless(ness?), and a correction output file that allows a user to reassemble the lossless file at a later time. Since I've covered the majority of the codecs, I'll jump into automated conversion since it wouldn't be practical to convert 20 songs individually every time you rip a new CD.
7. Automating your conversion
After ripping a cd or several songs using your cd ripper of choice (see below) into a lossless format such as flac you may realize that you may probably convert it several times (one for your listening library, another time for your DAP, maybe even another time for your phone). There are several options to automate this process. I'll use sound-convertor. Soundconverter supports Vorbis, Flac, and a variety of other codecs. If you want to try using Musepack you can use several utilities such as K3b's internal ripping functionality (pretty extensive last time I checked), and gRIP. Installing sound-converter is pretty easy depending on your distro, issue the following command:
$ pacman -S soundconverter #archLinux $ aptitude install soundconverter #debian $ yum install soundconverter #fedora
After running Soundconvertor, You should see a dialog box similar to the one below
Now add the file(s) or a directory of files you created earlier:
Remember if you didn't rip your files to do so now. I ripped Boston's Third Stage using asunder. You can use other tools like gRip or k3b to do the same thing. Just remember to encode to some lossless codec.
You should configure Soundconvertor's settings as needed under edit->preferences:
After Asunder finishes, check the output directory. It should contain the outputted Vorbis files (I created only one) ready for playback. I also created the source wav to show size differentials as you can see:
Well I think that was a rather lengthy article. I would recommend doing your own listening tests comparing the different lossy codecs encoded at varying levels of compression and additional research on the different codecs available to *NIX. If you’re interested in learning about the other utilities that I did not cover such as k3b or gRIP look up their homepages for further information. If you choose to use the console utilities to encode you audio, you’re probably going to have to develop a script to automate this process. If you choose to use bash you’re going to need a utility such as id3v2 for copying tags. Now that you have you’re music converted, use music player such as deadbeef, mocp, or Amarok to enjoy your high quality lossy backups!