Read XML painlessly
I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.
Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:
sudo pip install untangle
For some examples, visit the project page.
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.
#!/usr/bin/env python import untangle #XML = 'examples/planet_python.xml' # can read a file too XML = 'http://planet.python.org/rss20.xml' o = untangle.parse(XML) for item in o.rss.channel.item: title = item.title.cdata link = item.link.cdata if link: print title print ' ', link
It couldn’t be any simpler :)
According to Chris,
untangle doesn’t support documents with namespaces (yet).
Alternatives (update 20111031)
Here are some alternatives (thanks reddit).
lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.