简体   繁体   English

CSV 到结构化嵌套 JSON 使用 python

[英]CSV to structured nested JSON using python

I'm trying to convert flat structure csv into nested json structure.我正在尝试将平面结构 csv 转换为嵌套的 json 结构。

I have some data like:我有一些数据,例如:

State   SubRegion       Postcode    Suburb
ACT South Canberra      2620    Oaks Estate
ACT North Canberra      2601    Acton
ACT North Canberra      2602    Ainslie
ACT Gungahlin-Hall      2914    Amaroo

I want desired output like this:我想要像这样的 output :

[
       {
          "name":"ACT",
          "regions":[
             {
                "name":"South Canberra",
                "suburbs":[
                   {
                      "postcode":"2620",
                      "name":"Oaks Estate"
                   }
                ]
             },
             {
                "name":"North Canberra",
                "suburbs":[
                   {
                      "postcode":"2601",
                      "name":"Acton"
                   },
                   {
                      "postcode":"2602",
                      "name":"Ainslie"
                   }
                ]
             },
             {
                "name":"Gungahlin-Hall",
                "suburbs":[
                   {
                      "postcode":"2914",
                      "name":"Amaroo"
                   }
                ]
             }
          ]
       }
    ]

I'm trying to get this structure using pandas and normal script but didn't get the correct structure yet.我正在尝试使用 pandas 和普通脚本来获得这个结构,但还没有得到正确的结构。

i think this should work我认为这应该有效

import csv
import json 

def add_new_region(name, postcode, name2):
    d = {"name" : name,
     "suburbs" : [add_suburb(postcode, name2)]
     }
    return d
    
def add_suburb(postcode, name):
    return {"postcode" :  postcode,
              "name" : name
              }
    
datalist=[]
region_dict={}
region_dict_counter = 0
with open("data.csv", "r") as f:
    data = csv.reader(f)
    next(data) # skip headers
    for row in data:
        if row[0] in region_dict.keys():
            for x in (datalist[region_dict[row[0]]])["regions"]:
                if x["name"] == row[1]:
                    (x["suburbs"]).append(add_suburb(row[2], row[3]))
                    break
            else :
                datalist[region_dict[row[0]]]["regions"].append(add_new_region(row[1], row[2], row[3]))
                    
        else:
            d = { "name" : row[0],
                 "regions" : [ add_new_region(row[1], row[2], row[3])]}
            datalist.append(d)
            region_dict[row[0]] = region_dict_counter
            region_dict_counter+=1
json_data=json.dumps(datalist, indent=4)
print(json_data)
with open("data.json", "w") as j:
    j.write(json_data)

I have solved this problem.我已经解决了这个问题。 Here is the solution:这是解决方案:

def getindex(convertedList, value):
    ivd = -1
    for index, item in enumerate(convertedList):
        # print("line 7 : ", item, value)
        if item['name'] == value:
            ivd =  index
            break
        else:
            ivd = -1
    return ivd    
with open('Regions.csv', 'r') as file:
        reader = csv.reader(file)
        mainData = []
        loopIndex = 0
        for row in reader:
            if loopIndex > 0:
                index = getindex(mainData, row[0])
                if index > -1:
                    subindex = getindex(mainData[index]['regions'], row[1])
                    if subindex > -1:
                        suburbObj = {
                            'postcode' : row[3],
                            'name' : row[4]
                        }
                        mainData[index]['regions'][subindex]['suburbs'].append(suburbObj)
                    else :
                        regionObj = {
                            "name" : row[1],
                            "suburbs" : [{
                                "name" : row[4],
                                "postCode" : row[3]
                            }]
                        }
                        mainData[index]['regions'].append(regionObj)
                else :                
                    stateObj = {
                        'name' : row[0],
                        'regions' : [{
                            "name" : row[1],
                            "suburbs" : [{
                                "name" : row[4],
                                "postCode" : row[3]
                            }] 
                        }]
                    }
                    mainData.append(stateObj)
            loopIndex = loopIndex + 1  

If anyone has any better-optimized code, you can post your solutions.如果有人有任何更好的优化代码,您可以发布您的解决方案。

Thanks谢谢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM