简体   繁体   中英

convert csv file to multiple nested json format

I have written a code to convert csv file to nested json format. I have multiple columns to be nested hence assigning separately for each column. The problem is I'm getting 2 fields for the same column in the json output.

import csv
import json
from collections import OrderedDict

csv_file = 'data.csv'
json_file = csv_file + '.json'

def main(input_file):
    csv_rows = []
    with open(input_file, 'r') as csvfile:
        reader = csv.DictReader(csvfile, delimiter='|')
        for row in reader:
            row['TYPE'] = 'REVIEW',   # adding new key, value 
            row['RAWID'] = 1,
            row['CUSTOMER'] = {
                "ID": row['CUSTOMER_ID'],
                "NAME": row['CUSTOMER_NAME']
            }
            row['CATEGORY'] = {
                "ID": row['CATEGORY_ID'],
                "NAME": row['CATEGORY']
            }
            del (row["CUSTOMER_NAME"], row["CATEGORY_ID"], 
            row["CATEGORY"], row["CUSTOMER_ID"])   # deleting since fields coccuring twice
            csv_rows.append(row)

    with open(json_file, 'w') as f:
        json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
        f.write('\n')

The output is as below:

[
{
    "CATEGORY": {
        "ID": "1", 
        "NAME": "Consumers"
    }, 
    "CATEGORY_ID": "1",
    "CUSTOMER_ID": "41",
    "CUSTOMER": {
        "ID": "41", 
        "NAME": "SA Port"
    },
    "CUSTOMER_NAME": "SA Port",
    "RAWID": [
        1 
    ]
}
]

I'm getting 2 entries for the fields I have assigned using row[''].

  1. Is there any other way to get rid of this? I want only one entry for a particular field in each record.
  2. Also how can I convert the keys to lower case after reading from csv.DictReader(). In my csv file all the columns are in upper case and hence I'm using the same to assign. But I want to convert all of them to lower case.

In order to convert the keys to lower case, it would be simpler to generate a new dict per row. BTW, it should be enough to get rid of the duplicate fields:

    for row in reader:
        orow = collection.OrderedDict()
        orow['type'] = 'REVIEW',   # adding new key, value 
        orow['rawid'] = 1,
        orow['customer'] = {
            "id": row['CUSTOMER_ID'],
            "name": row['CUSTOMER_NAME']
        }
        orow['category'] = {
            "id": row['CATEGORY_ID'],
            "name": row['CATEGORY']
        }
        csv_rows.append(orow)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM