Home > python > detect duplicate keys in a JSON file

detect duplicate keys in a JSON file

Problem
I want to edit a JSON file by hand but I’m afraid that somewhere I introduce a duplicate key by accident. If it happens, then the second key silently overwrites the first one. Example:

$ cat input.json 
{
    "content": {
        "a": 1,
        "a": 2
    }
}

Naive approach:

import json

with open("input.json") as f:
    d = json.load(f)

print(d)

# {'content': {'a': 2}}

If there is a duplicate key, it should fail! But it remains silent and you have no idea that you just lost some data.

Solution
I found the solution here.

import json

def dict_raise_on_duplicates(ordered_pairs):
    """Reject duplicate keys."""
    d = {}
    for k, v in ordered_pairs:
        if k in d:
           raise ValueError("duplicate key: %r" % (k,))
        else:
           d[k] = v
    return d

def main():
    with open("input.json") as f:
        d = json.load(f, object_pairs_hook=dict_raise_on_duplicates)

    print(d)

Now you get a nice error message:

Traceback (most recent call last):
  File "./check_duplicates.py", line 28, in <module>
    main()
  File "./check_duplicates.py", line 21, in main
    d = json.load(f, object_pairs_hook=dict_raise_on_duplicates)
  File "/usr/lib64/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib64/python3.5/json/__init__.py", line 332, in loads
    return cls(**kw).decode(s)
  File "/usr/lib64/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.5/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "./check_duplicates.py", line 13, in dict_raise_on_duplicates
    raise ValueError("duplicate key: %r" % (k,))
ValueError: duplicate key: 'a'

If your json file has no duplicates, then the code aboce nicely prints its content.

Categories: python Tags: ,
  1. October 24, 2016 at 22:40

    The problem is that it will also apply to dictionaries that are values.
    { “a”: {“b”:”b”, “b”:”c”}} will fail. Is there a way to make it only look at the keys?

  2. July 4, 2019 at 17:47

    Hi. I’m working on a solution to a problem with duplicate mapping keys in swagger.json of the flask-restplus project: https://github.com/noirbizarre/flask-restplus/issues/661

    I came across this post when searching for a way to force python’s json decoder to require unique keys. I’d like to incorporate this solution into the project’s TestClient class which runs with pytest, with an attribution. Would that be okay? I thought I’d write something like:

    # Borrowed from https://pythonadventures.wordpress.com/2016/03/06/detect-duplicate-keys-in-a-json-file/
    # Thank you to WordPress author @ubuntuincident, a.k.a. Jabba Laci.

    Indeed, thanks. It saved me some time and trouble.

    • July 4, 2019 at 18:39

      Sure, of course you can use it. I also took this tip from StackOverflow (as I mention that in the post) :)

      • July 9, 2019 at 15:23

        Thank you. I skimmed the post and missed the link. Yours came up in the search results and Stack Overflow didn’t, unless it was further down, so I think you deserve credit for being discoverable.

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: