Convert JSON to CSV with complex arrays in Python

Question

I have several JSON files with nested data. Utilizing Python, I was able to use pandas to help with that:

import pandas as pd

df = pd.read_json (r'data.json')
export_csv = df.to_csv (r'data.csv', index = None, header=True)

However, this only works for simple JSON files. The ones I have are complex with nested arrays and some of the JSON data is merged under the columns. For example, if we're going to use this sample data:

`data.json`

[
  {
    "id": 1,
    "name": {
      "english": "Bulbasaur",
      "french": "Bulbizarre"
    },
    "type": [
      "Grass",
      "Poison"
    ],
    "base": {
      "HP": 45,
      "Attack": 49,
      "Defense": 49
    }
  },
  {
    "id": 2,
    "name": {
      "english": "Ivysaur",
      "french": "Herbizarre"
    },
    "type": [
      "Grass",
      "Poison"
    ],
    "base": {
      "HP": 60,
      "Attack": 62,
      "Defense": 63
    }
  }
]

The result ends up like the following:

You can see that any array past the first level is showing it in JSON (eg {'english': 'Bulbasaur', 'french': 'Bulbizarre'} ). Ideally, it should break those child arrays into a column with the name of the element:

On top of that, the other JSON files have different element names and order. Therefore, the script should catch all of the different element names and then convert them into CSV columns.

How can I achieve this?

Answer 1

check out flatten_json

from flatten_json import flatten
dic = [
  {
    "id": 1,
    "name": {
      "english": "Bulbasaur",
      "french": "Bulbizarre"
    },
    "type": [
      "Grass",
      "Poison"
    ],
    "base": {
      "HP": 45,
      "Attack": 49,
      "Defense": 49
    }
  },
  {
    "id": 2,
    "name": {
      "english": "Ivysaur",
      "french": "Herbizarre"
    },
    "type": [
      "Grass",
      "Poison"
    ],
    "base": {
      "HP": 60,
      "Attack": 62,
      "Defense": 63
    }
  }
]

dic_flattened = (flatten(d, '.') for d in dic)
df = pd.DataFrame(dic_flattened)

Output:

   id name.english name.french type.0  type.1  base.HP  base.Attack  base.Defense
0   1    Bulbasaur  Bulbizarre  Grass  Poison       45           49            49
1   2      Ivysaur  Herbizarre  Grass  Poison       60           62            63

Answer 2

Using json_normalize will get you almost there but to split the list you need something extra:

f = lambda x: 'type.{}'.format(x + 1)
df = df.join(pd.DataFrame(df.pop('type').values.tolist()).rename(columns=f))

print(df)

Output

   id name.english name.french  ...  base.Defense  type.1  type.2
0   1    Bulbasaur  Bulbizarre  ...            49   Grass  Poison
1   2      Ivysaur  Herbizarre  ...            63   Grass  Poison

[2 rows x 8 columns]

Answer 3

I'll suggest using a for loop, coupled with a defaultdict , usually easier and faster when doing iterations (that do not have aggregations) to stay out of pandas until the final output:

from collections import defaultdict

df = defaultdict(list)

val = {}
box = []
for entry in data: # data is the sample data you shared
    for key, value in entry.items():
        if key == "id":
            temp = [(key, value)]
        elif isinstance(value, dict):
            temp = [(f"{key}.{k}", v) for k, v in value.items()]
        else:
            temp = [(f"{key}.{k}", v) for k, v in enumerate(value, 1)]
        box.extend(temp)

for k, v in box:
    df[k].append(v)


df

defaultdict(list,
            {'id': [1, 2],
             'name.english': ['Bulbasaur', 'Ivysaur'],
             'name.french': ['Bulbizarre', 'Herbizarre'],
             'type.1': ['Grass', 'Grass'],
             'type.2': ['Poison', 'Poison'],
             'base.HP': [45, 60],
             'base.Attack': [49, 62],
             'base.Defense': [49, 63]})

Create dataframe

pd.DataFrame(df)

    id  name.english    name.french type.1  type.2  base.HP base.Attack base.Defense
0   1   Bulbasaur      Bulbizarre   Grass   Poison     45      49       49
1   2   Ivysaur        Herbizarre   Grass   Poison     60      62       63

Convert JSON to CSV with complex arrays in Python

Question

`data.json`

3 answers

solution1
2 ACCPTED 2020-12-25 20:30:33

solution2
1 2020-12-21 21:41:20

solution3
0 2020-12-21 22:26:08

Convert JSON to CSV with complex arrays in Python

Question

data.json

3 answers

solution1 2 ACCPTED 2020-12-25 20:30:33

solution2 1 2020-12-21 21:41:20

solution3 0 2020-12-21 22:26:08

`data.json`

solution1
2 ACCPTED 2020-12-25 20:30:33

solution2
1 2020-12-21 21:41:20

solution3
0 2020-12-21 22:26:08