Posts Tagged ‘xml’

XML to dict / XML to JSON

December 29, 2014 Leave a comment

You have an XML file and you want to convert it to dict or JSON.

Well, if you have a dict, you can convert it to JSON with “json.dump()“, so the real question is: how to convert an XML file to a dictionary?

There is an excellent library for this purpose called xmltodict. Its usage is very simple:

import xmltodict

# It doesn't work with Python 3! Read on for the solution!
def convert(xml_file, xml_attribs=True):
    with open(xml_file) as f:
        d = xmltodict.parse(f, xml_attribs=xml_attribs)
        return d

This worked well under Python 2.7 but I got an error under Python 3. I checked the project’s documentation and it claimed to be Python 3 compatible. What the hell?

The error message was this:

Traceback (most recent call last):
  File "/home/jabba/Dropbox/python/lib/jabbapylib2/apps/", line 247, in parse
TypeError: read() did not return a bytes object (type=str)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./", line 27, in <module>
  File "./", line 17, in convert
    d = xmltodict.parse(f, xml_attribs=xml_attribs)
  File "/home/jabba/Dropbox/python/lib/jabbapylib2/apps/", line 249, in parse
    parser.Parse(xml_input, True)
TypeError: '_io.TextIOWrapper' does not support the buffer interface

I even filed an issue ticket :)

After some debugging I found a hint here: you need to open the XML file in binary mode!

XML to dict (Python 2 & 3)
So the correct version that works with Python 3 too is this:

import xmltodict

def convert(xml_file, xml_attribs=True):
    with open(xml_file, "rb") as f:    # notice the "rb" mode
        d = xmltodict.parse(f, xml_attribs=xml_attribs)
        return d

XML to JSON (Python 2 & 3)
If you want JSON output:

import json
import xmltodict

def convert(xml_file, xml_attribs=True):
    with open(xml_file, "rb") as f:    # notice the "rb" mode
        d = xmltodict.parse(f, xml_attribs=xml_attribs)
        return json.dumps(d, indent=4)
Categories: python Tags: , , , , , ,

Check Gmail for new messages

September 8, 2012 2 comments

I want to check my new Gmail messages periodically. When I get a message from a specific sender (with a specific Subject), I want to trigger some action. How to do that?

Fortunately, there is an atom feed of unread Gmail messages at All you have to do it is visit this page, send your login credentials, fetch the feed and process it.

import urllib2


def get_unread_msgs(user, passwd):
    auth_handler = urllib2.HTTPBasicAuthHandler()
        realm='New mail feed',
    opener = urllib2.build_opener(auth_handler)
    feed = urllib2.urlopen(FEED_URL)


if __name__ == "__main__":
    import getpass

    user = raw_input('Username: ')
    passwd = getpass.getpass('Password: ')
    print get_unread_msgs(user, passwd)

For reading XML I use the untangle module:

import untangle    # sudo pip install untangle

xml = get_unread_msgs(USER, PASSWORD)
o = untangle.parse(xml)
    for e in o.feed.entry:
        title = e.title.cdata
        print title
except IndexError:
    pass    # no new mail


Categories: python Tags: , , , ,

Serializations: data <-> XML, data <-> JSON, XML <->JSON

January 20, 2012 Leave a comment


  1. Python data to XML and back
  2. Python data to JSON and back
  3. XML to JSON and back


import json
import xmlrpclib
from xml2json import Xml2Json

def data_to_xmlrpc(data):
    """Return value: XML RPC string."""
    return xmlrpclib.dumps((data,)) # arg. is tuple

def xmlrpc_to_data(xml):
    """Return value: python data."""
    return xmlrpclib.loads(xml)[0][0]

def data_to_json(data):
    """Return value: JSON string."""
    data_string = json.dumps(data)
    return data_string

def json_to_data(data_string):
    """Return value: python data."""
    data = json.loads(data_string)
    return data

def xml_to_json(xml):
    """Return value: JSON string."""
    res = Xml2Json(xml).result
    return json.dumps(res)

def json_to_xmlrpc(data_string):
    """Return value: XML RPC string."""
    data = json.loads(data_string)
    return data_to_xmlrpc(data)

def xmlrpc_to_json(xmlrpc):
    """Return value: JSON string."""
    data = xmlrpc_to_data(xmlrpc)
    return data_to_json(data)

The full source code ( together with the imported are available here. This work is part of my jabbapylib library.

Unit tests are here, they show you how to use these functions. Examples with comments are here.

Categories: python Tags: , , ,

Read XML painlessly

October 30, 2011 3 comments

I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.

Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:

sudo pip install untangle

For some examples, visit the project page.

Use Case
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.

#!/usr/bin/env python

import untangle

#XML = 'examples/planet_python.xml'     # can read a file too
XML = ''

o = untangle.parse(XML)
for item in
    title = item.title.cdata
    link =
    if link:
        print title
        print '   ', link

It couldn’t be any simpler :)

According to Chris, untangle doesn’t support documents with namespaces (yet).

Related posts

Alternatives (update 20111031)
Here are some alternatives (thanks reddit).

lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.

Categories: python Tags: , , , , ,

Write XML to file

April 4, 2011 2 comments


I wanted to create an XML file. The file was simple but I wanted to avoid producing it with “print” commands. Which API should be used for this purpose? The produced XML should be human readable, i.e. pretty printed (indented).


This post is based on the thread Best XML writing tool for Python.

(1) elementtree.SimpleXMLWriter (no indenting)

The SimpleXMLWriter module contains a simple helper class for applications that need to generate well-formed XML data. The interface is very simple:

#!/usr/bin/env python

from elementtree.SimpleXMLWriter import XMLWriter
import sys

w = XMLWriter(sys.stdout)
html = w.start("html")

w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")

w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")

w.start("p")"this is ")
w.element("b", "bold")" and ")
w.element("i", "italic")".")


However, the output is not indented and as I saw, this feature is missing :( Here is the output of the code above:

<html><head><title>my document</title><meta name="generator" value="my application 1.0" /></head><body><h1>this is a heading</h1><p>this is a paragraph</p><p>this is <b>bold</b> and <i>italic</i>.</p></body></html>

If we prettify it, it will look like this:

<?xml version="1.0"?>
    <title>my document</title>
    <meta name="generator" value="my application 1.0"/>
    <h1>this is a heading</h1>
    <p>this is a paragraph</p>
    <p>this is <b>bold</b> and <i>italic</i>.</p>

You can install elementtree from PyPI.

(2) lxml.etree (can do indenting)

This is what I chose for my project. This API is also very easy to use and it can do indenting. Documentation is here.


#!/usr/bin/env python

from lxml import etree as ET

root = ET.Element('background')
starttime = ET.SubElement(root, 'starttime')
hour = ET.SubElement(starttime, 'hour')
hour.text = '00'
minute = ET.SubElement(starttime, 'minute')
minute.text = '00'
second = ET.SubElement(starttime, 'second')
second.text = '01'

print ET.tostring(root, pretty_print=True, xml_declaration=True)
# write to file:
# tree = ET.ElementTree(root)
# tree.write('output.xml', pretty_print=True, xml_declaration=True)


<?xml version='1.0' encoding='ASCII'?>

On PyPI, you can find lxml here. However, you will have to install some additional packages too:

sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
# until Ubuntu 10.10:
sudo apt-get install python2.6-dev
# from Ubuntu 11.04:
sudo apt-get install python2.7-dev
# under Ubuntu 14.04 I needed this too:
sudo apt-get install -y zlib1g-dev

Then, you can install the library with “sudo pip install lxml“.


Related posts