简体   繁体   中英

CSV to nested JSON using Python/pandas

I'm trying to convert a flat CSV to a nested JSON format. This is my data:

# data.csv
company_id,company_name,income_type,income_amt
1,"Foobar Inc","royalties",5000000
2,"ACME Corp","sales",3000000
2,"ACME Corp","rent",1000000

And need to convert to the following JSON structure:

{"data": [{
            "company_id": 1,
            "name": "Foobar Inc",
            "income": ["royalties": 5000000]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "sales": 3000000,
                "rent": 1000000
            ]
        }]
}

But my current code (based on this and using Python and the pandas library):

# script.py
import json
import pandas as pd

df = pd.read_csv('data.csv')

def get_nested_rec(key, grp):
rec = {}

    rec['company_id'] = key[0]
    rec['company_name'] = key[1]

    for field in ['income_type']:
        income_types = list(grp[field].unique())
        rec['income'] = income_types

    return rec

records = []

for key, grp in df.groupby(['company_id','company_name','income_type','income_amt']):
    rec = get_nested_rec(key, grp)
    records.append(rec)

records = dict(data = records)

print(json.dumps(records, indent=4))

Outputs this format:

{"data": [
        {
            "company_id": 1,
            "company_name": "Foobar Inc", 
            "income": [
                "royalties"
            ]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "sales"
            ]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "rent"
            ]
        }
    ]}

Hitting a wall in figuring out how to combine rows with the same company_id into a single object and add in the income_amt values.

You can do it like this:

for key, grp in df.groupby('company_id'):
    records.append({
        "company_id": key,
        "company_name": grp.company_name.iloc[0],
        "income": {
            row.income_type: row.income_amt for row in grp.itertuples()
        }})

That gives you:

[{'company_id': 1,
  'company_name': 'Foobar Inc',
  'income': {'royalties': 5000000}},
 {'company_id': 2,
  'company_name': 'ACME Corp',
  'income': {'rent': 1000000, 'sales': 3000000}}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM