Home > python > Write XML to file

Write XML to file

Problem

I wanted to create an XML file. The file was simple but I wanted to avoid producing it with “print” commands. Which API should be used for this purpose? The produced XML should be human readable, i.e. pretty printed (indented).

Solution

This post is based on the thread Best XML writing tool for Python.

(1) elementtree.SimpleXMLWriter (no indenting)

The SimpleXMLWriter module contains a simple helper class for applications that need to generate well-formed XML data. The interface is very simple:

#!/usr/bin/env python

from elementtree.SimpleXMLWriter import XMLWriter
import sys

w = XMLWriter(sys.stdout)
html = w.start("html")

w.start("head")
w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")
w.end()

w.start("body")
w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")

w.start("p")
w.data("this is ")
w.element("b", "bold")
w.data(" and ")
w.element("i", "italic")
w.data(".")
w.end("p")

w.close(html)

However, the output is not indented and as I saw, this feature is missing :( Here is the output of the code above:

<html><head><title>my document</title><meta name="generator" value="my application 1.0" /></head><body><h1>this is a heading</h1><p>this is a paragraph</p><p>this is <b>bold</b> and <i>italic</i>.</p></body></html>

If we prettify it, it will look like this:

<?xml version="1.0"?>
<html>
  <head>
    <title>my document</title>
    <meta name="generator" value="my application 1.0"/>
  </head>
  <body>
    <h1>this is a heading</h1>
    <p>this is a paragraph</p>
    <p>this is <b>bold</b> and <i>italic</i>.</p>
  </body>
</html>

You can install elementtree from PyPI.

(2) lxml.etree (can do indenting)

This is what I chose for my project. This API is also very easy to use and it can do indenting. Documentation is here.

Example:

#!/usr/bin/env python

from lxml import etree as ET

root = ET.Element('background')
starttime = ET.SubElement(root, 'starttime')
hour = ET.SubElement(starttime, 'hour')
hour.text = '00'
minute = ET.SubElement(starttime, 'minute')
minute.text = '00'
second = ET.SubElement(starttime, 'second')
second.text = '01'

print ET.tostring(root, pretty_print=True, xml_declaration=True)
# write to file:
# tree = ET.ElementTree(root)
# tree.write('output.xml', pretty_print=True, xml_declaration=True)

Output:

<?xml version='1.0' encoding='ASCII'?>
<background>
  <starttime>
    <hour>00</hour>
    <minute>00</minute>
    <second>01</second>
  </starttime>
</background>

Installation:
On PyPI, you can find lxml here. However, you will have to install some additional packages too:

sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
# until Ubuntu 10.10:
sudo apt-get install python2.6-dev
# from Ubuntu 11.04:
sudo apt-get install python2.7-dev

Then, you can install the library with “sudo pip install lxml“.

Links

Related posts

About these ads
  1. January 11, 2012 at 12:59

    On Ubuntu you’re probably best off just doing:
    sudo apt-get install python-lxml

    • January 11, 2012 at 14:53

      Hmm, I didn’t think of that :) However, usually it’s better to install via pip since Ubuntu repositories are sometimes very out-of-date.

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 74 other followers

%d bloggers like this: