Home > python > XML to dict / XML to JSON

XML to dict / XML to JSON

Problem
You have an XML file and you want to convert it to dict or JSON.

Well, if you have a dict, you can convert it to JSON with “json.dump()“, so the real question is: how to convert an XML file to a dictionary?

Solution
There is an excellent library for this purpose called xmltodict. Its usage is very simple:

import xmltodict

# It doesn't work with Python 3! Read on for the solution!
def convert(xml_file, xml_attribs=True):
    with open(xml_file) as f:
        d = xmltodict.parse(f, xml_attribs=xml_attribs)
        return d

This worked well under Python 2.7 but I got an error under Python 3. I checked the project’s documentation and it claimed to be Python 3 compatible. What the hell?

The error message was this:

Traceback (most recent call last):
  File "/home/jabba/Dropbox/python/lib/jabbapylib2/apps/xmltodict.py", line 247, in parse
    parser.ParseFile(xml_input)
TypeError: read() did not return a bytes object (type=str)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./xml2json.py", line 27, in <module>
    print(convert(sys.argv[1]))
  File "./xml2json.py", line 17, in convert
    d = xmltodict.parse(f, xml_attribs=xml_attribs)
  File "/home/jabba/Dropbox/python/lib/jabbapylib2/apps/xmltodict.py", line 249, in parse
    parser.Parse(xml_input, True)
TypeError: '_io.TextIOWrapper' does not support the buffer interface

I even filed an issue ticket :)

After some debugging I found a hint here: you need to open the XML file in binary mode!

XML to dict (Python 2 & 3)
So the correct version that works with Python 3 too is this:

import xmltodict

def convert(xml_file, xml_attribs=True):
    with open(xml_file, "rb") as f:    # notice the "rb" mode
        d = xmltodict.parse(f, xml_attribs=xml_attribs)
        return d

XML to JSON (Python 2 & 3)
If you want JSON output:

import json
import xmltodict

def convert(xml_file, xml_attribs=True):
    with open(xml_file, "rb") as f:    # notice the "rb" mode
        d = xmltodict.parse(f, xml_attribs=xml_attribs)
        return json.dumps(d, indent=4)
Advertisements
Categories: python Tags: , , , , , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: