Skip to content
August 12, 2014 / Pratik

Archiving a Podcast

I wanted to archive all the episodes of a video podcast. The podcast listed all the episodes in it’s own rss feed, but didn’t include the episode number in the filename. So, I wrote a quick python script that generates a bash script, which downloads the listed episodes. That python script also adds the episode number in each filename. I went with the approach creating a bash script, to make it easier to review each filename and what’s going to be downloaded. I need to review these things, because downloading a lot of files could take a lot of time and there is a risk of naming things the wrong way.

#!/usr/bin/env python3

import xml.etree.ElementTree as etree
import math
import sys

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('Usage ' + sys.argv[0] + ' location_of_downloaded_rss_file ')

    tree = etree.parse(sys.argv[1])
    root = tree.getroot()
    channel = root[0]
    urls = []
    current_episode_number = 0
    total_episodes = 0
    maximum_number_of_digits = 0

    for item in channel.iter('item'):
        enclosure = item.find('enclosure')

        if enclosure != None:

    total_episodes = len(urls)
    maximum_number_of_digits = int(math.log10(total_episodes))+1

    while len(urls) != 0:
        url = urls.pop()
        urlList = url.split('/')
        filename = urlList[-1]
        current_episode_number += 1
        current_episode_number_padded = str(current_episode_number).zfill(maximum_number_of_digits)

        print("wget '" + url + "' -O '" + current_episode_number_padded + '_' + filename + "' ")

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: