简体   繁体   中英

Python: Convert multiple columns of CSV file to nested Json

This is my input CSV file with multiple columns, I would like to convert this csv file to a json file with department, departmentID, and one nested field called customer and put first and last nested to this field.

department, departmentID, first, last
fans, 1, Caroline, Smith
fans, 1, Jenny, White
students, 2, Ben, CJ
students, 2, Joan, Carpenter
...

Output json file what I need:

[
{
"department" : "fans",
"departmentID: "1",
"customer" : [
    {
      "first" : "Caroline",
      "last" :  "Smith"
    },
    {
      "first" : "Jenny",
      "last" :  "White"
    }
    ]
},
{
"department" : "students", 
"departmentID":2,
"user" : 
     [
     {
      "first" : "Ben",
      "last" :  "CJ"
    },
    {
     "first" : "Joan",
      "last" :  "Carpenter"
    }
  ]
}
]

my code:

from csv import DictReader
from itertools import groupby
with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['group'], r['groupID'])):
        groups.append({
            "group": k[0],
            "groupID": k[1],
            "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
        })
        uniquekeys.append(k)

pprint(groups)

My issue is: groupID shows twice in the data, in and out nested json. What I want is group and groupID as grouby key.

The issue was you mixed the names of the keys so this line "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)] did not strip them properly from your dictionary there was no such key. So nothing was deleted.

I do not fully understand what keys you want so the following example assumes that data.csv looks exactly like in your question department and departmentID but the script converts it to group and groupID

from csv import DictReader
from itertools import groupby
from pprint import pprint

with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['department'], r['departmentID'])):
        groups.append({
            "group": k[0],
            "groupID": k[1],
            "user": [{k:v for k, v in d.items() if k not in ['department','departmentID']} for d in list(g)]
        })
        uniquekeys.append(k)

pprint(groups)

Output:

[{'group': 'fans',
  'groupID': '1',
  'user': [{'first': 'Caroline', 'last': 'Smith'},
           {'first': 'Jenny', 'last': 'White'}]},
 {'group': 'students',
  'groupID': '2',
  'user': [{'first': 'Ben', 'last': 'CJ'},
           {'first': 'Joan', 'last': 'Carpenter'}]}]

I used different keys so it would be really obvious which line does what and easy to customize it for different keys in input or output

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM