Using open source tools to capture my favorite radio program audio stream
Posted: 18 February 2009 at 18:48:24
Listen to any kind of syndicated talk radio program and you'll usually hear about some companion website the program has. Usually, there are a handful of free things you can get on a program's website, but many of these sites have a pay-to-play members' area where the really good content is. This includes MP3 downloads of the shows, access to live audio and/or video streams, special behind-the-scenes content, forums, desktop backgrounds, etc.
The MP3 downloads are very convenient for people who don't have the luxury of sitting in front of a radio (or driving a car) for a solid three hours while a radio program is broadcast (with advertisements). It's also a boon for people who find radio advertisements annoying.
The only problem with the MP3 downloads is that theme music and produced portions of the program can not, by law, be included in the MP3 file because otherwise the MP3 would be a copyright violation.
Live streams, on the other hand, are not subject to the above described restriction because they're like a broadcast in nature. They're not a time-shift of the original program. So, if you listen to the live stream or even listen to a pre-recorded program as a stream, music and produced segments may be included.
I listen to the Glenn Beck radio program quite often. I used to download the MP3 files to listen to in the car, but it got annoying everytime Glenn and his producers would put together a segment like "Sportscasters at the 2031 animal-human hybrid baseball games", or "The History Of the Democratic Superdelegates" and I would hear Glenn say, "Listen to this... [pause] Oh man! That was great! Wasn't that great, Stu? Oh yeah! Alright! Dan? Wasn't that just the best? Yeah. Oh yeah."
I decided I needed to figure out how to save a stream.
I knew it was possible. Lots of software applications exist for any operating systems that will convert audio from a live stream into a static WAV file or similar. The open source program mplayer is one such example.
Breaking it down
First of all, I needed to figure out how the stream content made its way to my computer.
After I've logged into the Glenn Beck website as an Insider, I can click a link to listen to a stream of a particular hour of the program (or the whole program) in Windows Media format or RealAudio format. I figured I'd have better luck extracting the audio from the Windows Media format, so I went that route. Instead of just clicking the link and letting my web browser find some program that could handle the content, I saved the content to a file and then looked at the file.
The file it saved was a fairly straightforward XML file that looked something like this:
<ASX VERSION="3.0"> <TITLE>Glenn Beck</TITLE> <AUTHOR>Premiere Radio Networks</AUTHOR> <COPYRIGHT>Copyright 2008</COPYRIGHT> <ENTRY> <TITLE>Glenn Beck 1</TITLE> <AUTHOR>Premiere Radio Networks</AUTHOR> <COPYRIGHT>Copyright 2008</COPYRIGHT> <REF HREF="mms://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603.WMA?auth=blahblahblahblahblah" /> <REF HREF="http://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603.WMA?auth=blahblahblahblahblahblah </ENTRY> <ENTRY> <TITLE>Glenn Beck 2</TITLE> <AUTHOR>Premiere Radio Networks</AUTHOR> <COPYRIGHT>Copyright 2008</COPYRIGHT> <REF HREF="mms://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603_CLIP01.WMA?auth=blahblahblahblahblahblah" /> <REF HREF="http://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603_CLIP01.WMA?auth=blahblahblahblahandblah" /> </ENTRY>
...and so on.
This XML defines the MMS URLs for each segment of the show. There are several segments each hour. These individual MMS URLs are what I needed to feed to the application that was going to convert the audio stream to a file. In my case, I decided to use mplayer because it's just so good at everything it does!
The command line for doing the stream-to-file conversion looks like this:
mplayer -vc null -vo null -ao pcm:fast:file=dumpfile.wav \ 'mms://a0011.v67134.c6713.g.vm.akamaistream.net/blahblahblah...'
The real magic in the above command is where I use -ao pcm to tell mplayer to use the PCM file writer audio output driver (instead of sending the audio to my speakers).
This gives me a WAV file which I'll want to convert to an MP3 or Ogg-Vorbis file.
To convert a WAV file generated by the mplayer command above to an MP3 file, I use the open source lame tool:
lame -mf -q2 dumpfile.wav GlennBeck.mp3
Or, convert it to Ogg-Vorbis (the completely open and better-sounding-than-MP3 lossy audio codec):
oggenc -q2 --downmix -o GlennBeck.ogg dumpfile.wav
I've now covered the basic mechanical components of converting an audio stream into an MP3 or Ogg-Vorbis file. Next I automate it all.
Automation
Because I'm a long-time Perl junkie, I investigated how I could use a Perl script to act as the glue between the components and get the whole process of capturing a stream and converting it to MP3 or Ogg-Vorbis.
In the above walk-through, I manually logged into the Glenn Beck website with my web browser. To really completely automate this puppy, I wanted the script to log in for me. It didn't take me very long to figure out the Perl CPAN module WWW::Mechanize was what I needed to use.
WWW::Mechanize does several handy things for the programmer. It loads and parses web pages and can follow links, populate forms, and other basic kinds of interaction. It keeps track of its own cookies and session data too.
To get into the Insider area of the Glenn Beck website, members must enter their username and password on the Insider login page.
Looking at the HTML source for this page, I learned the form was named "aform", the username field was named "iUName", and the password field was named "iPassword".
I now had all the information I needed for WWW::Mechanize to log in:
my $agent = WWW::Mechanize->new( cookie_jar => {}, ); my $resp = $agent->get('http://www.glennbeck.com/content/insider'); if($resp->is_success) { $resp = $agent->submit_form( form_name => 'aform', fields => { 'iUName' => 'myusername', 'iPassword' => 'shhhhhhhh!', }, button => 'submit');
Walking through the code above: First, I create the WWW::Mechanize object with an in-memory cookie jar (cookie_jar => {}). Next, I use the object to get() the log-in page. If everything works well so far, I tell the object to find the form named "aform", fill in the username and password fields, and submit the form.
One thing I realized as I was debugging my script was that after I logged in on the Insider page, I was immediately redirected to another page. In order for my script to work, it needed to follow the redirect. This was an easy fix:
my $agent = WWW::Mechanize->new( cookie_jar => {}, redirect_ok => 1, );
The page I got redirected to has the links on it for the streaming audio, so I'm exactly where I want to be if I want to capture and convert the latest and greatest Glenn Beck Program audio stream.
WWW::Mechanize can find links within the page with a variety of methods. One of these leverages Perl's excellent support for regular expressions. You can also search for links by the order in which they appear. The link I'm looking for looks like this:
<a href="http://www.premiereinteractive.com/cgi-bin/members.cgi?stream=shows/GLENNBECKWIN20080604&site=glennbeck&type=win_show"><img src="http://media.glennbeck.com/images/common/header_media5off.jpg" name="icon5" width="26" height="34" border="0" id="icon5" onMouseOver="MM_swapImage('icon5','','http://media.glennbeck.com/images/common/header_media5on.jpg',1)" onMouseOut="MM_swapImgRestore()" /></a>
So, my script has the following:
$link = $agent->find_link( url_regex => qr/${datestr}.*win_show$/); $resp = $agent->get($link);
This assumes I have a scalar variable $datestr that contains a formatted date for the show I want to capture.
Originally, I was going to use one of Perl's several XML-parsing modules to make sense of the XML in the stream link, but in the end all I needed was a regular expression to extract the mms: URLs.
my $xml = $resp->decoded_content; my (@urls) = $xml =~ m/HREF="(mms:[^"]+)"/msg;
This gives me a list of URLs stored in @urls. Now I just need to feed them to mplayer:
$i = 1; foreach my $u (@urls) { my $seq = sprintf("%02d", $i); my @cmd = ( 'mplayer', '-vc', 'null', '-vo', 'null', '-ao', "pcm:fast:file=${datestr}-${seq}.wav", $u); system(@cmd); if ($? == -1) { print "failed to execute: $!\n"; } elsif ($? & 127) { printf "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without'; } else { printf "child exited with value %d\n", $? >> 8; } $i++; }
This little ditty creates an output file for each of the segment streams. These are named something like 20080604-05.wav.
When the loop is finished, I have several WAV files sitting on the disk. Now I need to somehow sew them all together into one big WAV file so I can convert it to an MP3 or Ogg-Vorbis file. For this, I turn to sox. I decided to have the Perl script generate a shell script to run all the sox and lame commands needed.
open FH, ">/tmp/${datestr}.sh"; foreach my $j (1..($i-1)) { my $seq = sprintf("%02d", $j); print FH 'sox ', "${datestr}-${seq}.wav", " -t raw - | cat >> /tmp/${datestr}.raw", "\n"; } print FH 'sox -w -s -c 1 -r 22050 ', "/tmp/${datestr}.raw ${datestr}.wav\n"; print FH "lame -mf -q2 ${datestr}.wav ${datestr}.mp3 "; print FH "--tt \"Glenn Beck Show - $datestr\" "; print FH "--ta \"Glenn Beck\" --add-id3v2\n"; close FH;
Then, I run the shell script:
system('sh', "/tmp/${datestr}.sh");
Finally, I do a little cleanup:
unlink "/tmp/${datestr}.sh", "/tmp/${datestr}.raw", map({"${datestr}-$_.wav"} (1..($i-1)));
And, I'm done. There are many other ways I could have gone about doing this, but I found a way that worked and ran with it. I'd love to hear from people who have done something similar and how they did it.