Read XML painlessly
Problem
I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.
Solution
Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:
sudo pip install untangle
For some examples, visit the project page.
Use Case
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.
#!/usr/bin/env python import untangle #XML = 'examples/planet_python.xml' # can read a file too XML = 'http://planet.python.org/rss20.xml' o = untangle.parse(XML) for item in o.rss.channel.item: title = item.title.cdata link = item.link.cdata if link: print title print ' ', link
It couldn’t be any simpler :)
Limitations
According to Chris, untangle
doesn’t support documents with namespaces (yet).
Related posts
Alternatives (update 20111031)
Here are some alternatives (thanks reddit).
- Python and XML (overview)
- lxml
- amara [official tutorial]
- xmltodict (converts XML to dict; added on 20141229)
lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.
For rss check out the Feedparser library.
Thanks, I didn’t know about Feedparser.
Update (20111102): there is also speedparser. According to its author it’s much faster than Feedparser.