简体   繁体   中英

Easiest way to split JSON file using Python

I am working on an interactive visualization of the world happiness report from the years 2015 up to 2020. The data was split into 6 csv files. Using pandas, I have succesfully cleaned the data and concatenated them into one big JSON file with the following format:

[
  {
    "Country": "Switzerland",
    "Year": 2015,
    "Happiness Rank": 1,
    "Happiness Score": 7.587000000000001,
  },
  {
    "Country": "Iceland",
    "Year": 2015,
    "Happiness Rank": 2,
    "Happiness Score": 7.561,
  },
  {
    "Country": "Switzerland",
    "Year": 2016,
    "Happiness Rank": 2,
    "Happiness Score": 7.5089999999999995,
  },
  {
    "Country": "Iceland",
    "Year": 2016,
    "Happiness Rank": 3,
    "Happiness Score": 7.501,
  },
  {
    "Country": "Switzerland",
    "Year": 2017,
    "Happiness Rank": 3,
    "Happiness Score": 7.49399995803833,
  },
  {
    "Country": "Iceland",
    "Year": 2017,
    "Happiness Rank": 1,
    "Happiness Score": 7.801,
  }
]

Now, I would like to programmatically format the JSON file such that it has the following format:

{
    "2015": {
        "Switzerland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.587000000000001
        },
        "Iceland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.561
        }
    },
    "2016": {
        "Switzerland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.5089999999999995
        },
        "Iceland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.501
        }
    },
    "2017": {
        "Switzerland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.49399995803833
        },
        "Iceland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.801
        }
    }
}

It has to be done programmatically, since there are over 900 distinct (country, year) pairs. I want the JSON in this format since it make the JSON file more readable, and makes it easier to select appropriate data. If I want the rank of Iceland in 2015, I can then do data[2015]["Iceland"]["Happiness Rank"]

Does anyone know the easiest / most convenient way to do this in Python?

If data is your original list of dictionaries:

def by_year(data):
    from itertools import groupby
    from operator import itemgetter

    retain_keys = ("Happiness Rank", "Happiness Score")

    for year, group in groupby(data, key=itemgetter("Year")):
        as_tpl = tuple(group)
        yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))


print(dict(by_year(data)))

Output:

{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>> 

This assumes that the dictionaries in data will already be grouped together by year.

I assume you have the original pandas dataframe from which this JSON was created. With pandas, you can do df = df.groupby(['Year', 'Country']) . You can then follow the procedure in pandas groupby to nested json to convert it to JSON.

you might find groupby from the itertools module useful. I was able to do this with

import itertools
groups = itertools.groupby(data, lambda x: x["Year"])
newdict = {str(year): {entry["Country"]:entry for entry in group} for year, group in groups}

Where data is the data with the form of the example you gave

It will retain the original fields in the dict, but it can easily be deleted in this way

for countries in newdict.values():
    for c in countries.values():
        del c["Year"]
        del c["Country"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM