简体   繁体   中英

How to convert CSV to nested JSON in Python

I have a csv file in the following format:

a b c d e
1 2 3 4 5
9 8 7 6 5

I want to convert this csv file to Nested JSON format, like this:

[{"a": 1,
"Purchase" : {
              "b": 2,
              "c": 3
              "d": 4},
"Sales": {
           "d": 4,
           "e": 5}},
{"a": 9,
"Purchase" : {
              "b": 8,
              "c": 7},
"Sales": {
           "d": 6,
           "e": 5}}]

How can I make this transformation? I can't seem to figure out how to make this transformation in Python. Keep in mind this is only sample table, my real table has multiple columns and thousands on rows, so manual operations are not economical.

Till now I have tried this code:

with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    for r in reader:
        r["purchase"] = {"b": r['b'],
                        "c": r['c'],
                        }

Here I am trying unsuccessfully to add another key value pair of my required dictionary, but not successfully. Same thing I would have done with Sales also but this is just sample.

A simple way is to add more columns; then use to_json method in pandas:

import pandas as pd
df = pd.read_csv('your_file.csv')
df['Purchase'] = df[['b','c','d']].to_dict('records')
df['Sales'] = df[['d','e']].to_dict('records')
out = df[['a', 'Purchase', 'Sales']].to_json(orient='records', indent=4)

Output:

[
    {
        "a":1,
        "Purchase":{
            "b":2,
            "c":3,
            "d":4
        },
        "Sales":{
            "d":4,
            "e":5
        }
    },
    {
        "a":9,
        "Purchase":{
            "b":8,
            "c":7,
            "d":6
        },
        "Sales":{
            "d":6,
            "e":5
        }
    }
]

You don't need any libraries for this, just specify the right dialect, eg for tab-separated:

import csv
import json


with open("tmp4.csv", "r") as f:
    result = [
        {
            "a": row["a"],
            "Purchase": {
                "b": row["b"],
                "c": row["c"],
            },
            "Sales": {
                "d": row["d"],
                "e": row["e"],
            },
        }
        for row in csv.DictReader(f, dialect='excel-tab')
    ]
assert (
    json.dumps(result)
    == '[{"a": "1", "Purchase": {"b": "2", "c": "3"}, "Sales": {"d": "4", "e": "5"}}, {"a": "9", "Purchase": {"b": "8", "c": "7"}, "Sales": {"d": "6", "e": "5"}}]'
)

When you do r["purchase"] = {"b": ...} , you're assigning the dictionary back to per-line object r which gets discarded at the end of the loop. Instead, create a new dictionary per record and append that to a list. Like:

result = []
with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    for r in reader:
        result.append({
            "a": r["a"],
            "Purchase" : {
                "b": r["b"],
                "c": r["c"],
                "d": r["d"],
            },
            "Sales": {
                "d": r["d"],
                "e": r["e"],
            },
        })

And to use a list comprehension to create result :

with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    result = [{
        "a": r["a"],
        "Purchase" : {
            "b": r["b"],
            "c": r["c"],
            "d": r["d"],
        },
        "Sales": {
            "d": r["d"],
            "e": r["e"],
        },
    } for r in reader]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM