Home > python > Read XML painlessly

Read XML painlessly

I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.

Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:

sudo pip install untangle

For some examples, visit the project page.

Use Case
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.

#!/usr/bin/env python

import untangle

#XML = 'examples/planet_python.xml'     # can read a file too
XML = 'http://planet.python.org/rss20.xml'

o = untangle.parse(XML)
for item in o.rss.channel.item:
    title = item.title.cdata
    link = item.link.cdata
    if link:
        print title
        print '   ', link

It couldn’t be any simpler :)

According to Chris, untangle doesn’t support documents with namespaces (yet).

Related posts

Alternatives (update 20111031)
Here are some alternatives (thanks reddit).

lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.

Categories: python Tags: , , , , ,
  1. Ivan Prolugin
    October 31, 2011 at 07:17

    For rss check out the Feedparser library.

    • October 31, 2011 at 10:12

      Thanks, I didn’t know about Feedparser.

      Update (20111102): there is also speedparser. According to its author it’s much faster than Feedparser.

  1. October 30, 2011 at 16:01

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: