Skip to content
August 12, 2014 / Pratik

Archiving a Podcast

I wanted to archive all the episodes of a video podcast. The podcast listed all the episodes in it’s own rss feed, but didn’t include the episode number in the filename. So, I wrote a quick python script that generates a bash script, which downloads the listed episodes. That python script also adds the episode number in each filename. I went with the approach creating a bash script, to make it easier to review each filename and what’s going to be downloaded. I need to review these things, because downloading a lot of files could take a lot of time and there is a risk of naming things the wrong way.

#!/usr/bin/env python3

import xml.etree.ElementTree as etree
import math
import sys

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('Usage ' + sys.argv[0] + ' location_of_downloaded_rss_file ')
        exit(1)

    tree = etree.parse(sys.argv[1])
    root = tree.getroot()
    channel = root[0]
    urls = []
    current_episode_number = 0
    total_episodes = 0
    maximum_number_of_digits = 0

    for item in channel.iter('item'):
        enclosure = item.find('enclosure')

        if enclosure != None:
            urls.append(enclosure.get('url'))

    total_episodes = len(urls)
    maximum_number_of_digits = int(math.log10(total_episodes))+1

    print('#!/bin/bash\n')
    while len(urls) != 0:
        url = urls.pop()
        urlList = url.split('/')
        filename = urlList[-1]
        current_episode_number += 1
        current_episode_number_padded = str(current_episode_number).zfill(maximum_number_of_digits)

        print("wget '" + url + "' -O '" + current_episode_number_padded + '_' + filename + "' ")
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: