lxml doesn’t want to compile on Ubuntu 16.04.
$ sudo apt install libxml2-dev libxslt1-dev python-dev zlib1g-dev
I was getting the error “
/usr/bin/ld: cannot find -lz“. It turned out that the package
zlib1g-dev was the cure…
Note that this is for Python 2. For Python 3 you might need to install the package
I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.
Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:
sudo pip install untangle
For some examples, visit the project page.
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.
#!/usr/bin/env python import untangle #XML = 'examples/planet_python.xml' # can read a file too XML = 'http://planet.python.org/rss20.xml' o = untangle.parse(XML) for item in o.rss.channel.item: title = item.title.cdata link = item.link.cdata if link: print title print ' ', link
It couldn’t be any simpler :)
According to Chris,
untangle doesn’t support documents with namespaces (yet).
Alternatives (update 20111031)
Here are some alternatives (thanks reddit).
- Python and XML (overview)
- amara [official tutorial]
- xmltodict (converts XML to dict; added on 20141229)
lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.
I wanted to create an XML file. The file was simple but I wanted to avoid producing it with “print” commands. Which API should be used for this purpose? The produced XML should be human readable, i.e. pretty printed (indented).
This post is based on the thread Best XML writing tool for Python.
(1) elementtree.SimpleXMLWriter (no indenting)
The SimpleXMLWriter module contains a simple helper class for applications that need to generate well-formed XML data. The interface is very simple:
#!/usr/bin/env python from elementtree.SimpleXMLWriter import XMLWriter import sys w = XMLWriter(sys.stdout) html = w.start("html") w.start("head") w.element("title", "my document") w.element("meta", name="generator", value="my application 1.0") w.end() w.start("body") w.element("h1", "this is a heading") w.element("p", "this is a paragraph") w.start("p") w.data("this is ") w.element("b", "bold") w.data(" and ") w.element("i", "italic") w.data(".") w.end("p") w.close(html)
However, the output is not indented and as I saw, this feature is missing :( Here is the output of the code above:
<html><head><title>my document</title><meta name="generator" value="my application 1.0" /></head><body><h1>this is a heading</h1><p>this is a paragraph</p><p>this is <b>bold</b> and <i>italic</i>.</p></body></html>
If we prettify it, it will look like this:
<?xml version="1.0"?> <html> <head> <title>my document</title> <meta name="generator" value="my application 1.0"/> </head> <body> <h1>this is a heading</h1> <p>this is a paragraph</p> <p>this is <b>bold</b> and <i>italic</i>.</p> </body> </html>
You can install elementtree from PyPI.
(2) lxml.etree (can do indenting)
This is what I chose for my project. This API is also very easy to use and it can do indenting. Documentation is here.
#!/usr/bin/env python from lxml import etree as ET root = ET.Element('background') starttime = ET.SubElement(root, 'starttime') hour = ET.SubElement(starttime, 'hour') hour.text = '00' minute = ET.SubElement(starttime, 'minute') minute.text = '00' second = ET.SubElement(starttime, 'second') second.text = '01' print ET.tostring(root, pretty_print=True, xml_declaration=True) # write to file: # tree = ET.ElementTree(root) # tree.write('output.xml', pretty_print=True, xml_declaration=True)
<?xml version='1.0' encoding='ASCII'?> <background> <starttime> <hour>00</hour> <minute>00</minute> <second>01</second> </starttime> </background>
On PyPI, you can find lxml here. However, you will have to install some additional packages too:
sudo apt-get install libxml2-dev sudo apt-get install libxslt-dev # until Ubuntu 10.10: sudo apt-get install python2.6-dev # from Ubuntu 11.04: sudo apt-get install python2.7-dev # under Ubuntu 14.04 I needed this too: sudo apt-get install -y zlib1g-dev
Then, you can install the library with “
sudo pip install lxml“.
- Best XML writing tool for Python @ SO (Here you can read about several other XML writing APIs.)
- Pretty printing XML in python @ SO