Home > python > Read XML painlessly

Read XML painlessly

Problem
I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.

Solution
Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:

sudo pip install untangle

For some examples, visit the project page.

Use Case
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.

#!/usr/bin/env python

import untangle

#XML = 'examples/planet_python.xml'     # can read a file too
XML = 'http://planet.python.org/rss20.xml'

o = untangle.parse(XML)
for item in o.rss.channel.item:
    title = item.title.cdata
    link = item.link.cdata
    if link:
        print title
        print '   ', link

It couldn’t be any simpler :)

Limitations
According to Chris, untangle doesn’t support documents with namespaces (yet).

Related posts

Alternatives (update 20111031)
Here are some alternatives (thanks reddit).

lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.

Categories: python Tags: , , , , ,
  1. Ivan Prolugin
    October 31, 2011 at 07:17

    For rss check out the Feedparser library.

    • October 31, 2011 at 10:12

      Thanks, I didn’t know about Feedparser.

      Update (20111102): there is also speedparser. According to its author it’s much faster than Feedparser.

  1. October 30, 2011 at 16:01

Leave a comment