简体   繁体   English

从CSV创建嵌套JSON

[英]Create nested JSON from CSV

I already read Create nested JSON from flat csv , but it didn't help in my case. 我已经从flat csv中读过Create nested JSON ,但在我的情况下它并没有帮助。

I have quite a big spreadsheet created with Google Docs consisting of 11 rows and 74 columns (some columns are not occupied). 我有一个很大的电子表格,使用包含11行和74列的Google Docs创建(某些列未被占用)。

I created an example on Google Drive . 我在Google云端硬盘上创建了一个示例。 When exported as a CSV it looks like this: 导出为CSV它看起来像这样:

id,name,email,phone,picture01,picture02,picture03,status
1,Alice,alice@gmail.com,2131232,"image01_01
[this is an image]",image01_02,image01_03,single
2,Bob,bob@gmail.com,2854839,image02_01,"image02_02
[description to image 2]",,married
3,Frank,frank@gmail.com,987987,image03_01,image03_02,,single
4,Shawn,shawn@gmail.com,,image04_01,,,single

Now I would like to have a JSON structure, which looks like this: 现在我想要一个JSON结构,如下所示:

{
    "persons": [
        {
            "type": "config.profile",
            "id": "1",
            "email": "alice@gmail.com",
            "pictureId": "p01",
            "statusId": "s01"
        },
        {
            "type": "config.pictures",
            "id": "p01",
            "album": [
                {
                    "image": "image01_01",
                    "description": "this is an image"
                },
                {
                    "image": "image_01_02",
                    "description": ""
                },
                {
                    "image": "image_01_03",
                    "description": ""
                }
            ]
        },
        {
            "type": "config.status",
            "id": "s01",
            "status": "single"
        },
        {
            "type": "config.profile",
            "id": "2",
            "email": "bob@gmail.com",
            "pictureId": "p02",
            "statusId": "s02"
        },
        {
            "type": "config.pictures",
            "id": "p02",
            "album": [
                {
                    "image": "image02_01",
                    "description": ""
                },
                {
                    "image": "image_02_02",
                    "description": "description to image 2"
                }
            ]
        },
        {
            "type": "config.status",
            "id": "s02",
            "status": "married"
        }
    ]
}

And so on for the other lines. 等等其他线路。

My theoretical approach would be to go through the CSV file per row (here starts the first problem: now every row is equal to one line, but sometimes several, thus I need to count the commas?). 我的理论方法是每行检查一次CSV文件(这里开始第一个问题:现在每行等于一行,但有时几行,因此我需要计算逗号?)。 Each row is equal to a block of config.profile , including the id , email , pictureId , and statusId (the latter two are being generated depending on the row number). 每行等于config.profile块,包括idemailpictureIdstatusId (后两个是根据行号生成的)。

Then for each row a config.pictures block is generated with the same id as the one inserted in the config.profile block. 然后,对于每一行,生成一个config.pictures块,其idconfig.profile块中插入的id相同。 The album is an array of as many elements as pictures are given. album是一系列元素,与图片一样多。

Lastly each row has a config.status block, which, again, has the same id as the one given in config.profile , and one entry of status with the corresponding status. 最后,每一行都有一个config.status块,它同样具有与config.profile给出的id相同的id ,以及一个具有相应状态的status条目。

I'm entirely clueless how to create the nested and conditional JSON file. 我完全不知道如何创建嵌套和条件JSON文件。

I just got to the point where I convert the CSV to valid JSON , without any nesting and additional info, which are not directly given in the CSV , like the type , pictureId , statusId , and so on. 我刚刚达到了将CSV转换为有效JSON ,没有任何嵌套和其他信息,这些信息在CSV中没有直接给出,如typepictureIdstatusId等。

Any help is appreciated. 任何帮助表示赞赏。 If it is easier to program this in another script language (like ruby ), I would gladly switch to those). 如果用另一种脚本语言(如ruby )更容易编程,我很乐意切换到那些)。

Before someone thinks this is a homework or whatnot. 在有人认为这是家庭作业或诸如此类的东西之前。 It is not. 它不是。 I just want to automate an otherwise very tiresome copy&paste task. 我只是想自动化一个非常烦人的复制和粘贴任务。

The csv module will handle the CSV reading nicely - including handling line breaks that are within quotes. csv模块将很好地处理CSV读取 - 包括处理引号内的换行符。

import csv
with open('my_csv.csv') as csv_file:
   for row in csv.reader(csv_file):
       # do work

The csv.reader object is an iterator - you can iterate through the rows in the CSV by using a for loop. csv.reader对象是一个迭代器 - 您可以使用for循环遍历CSV中的行。 Each row is a list, so you can get each field as row[0] , row[1] , etc. Be aware that this will load the first row (which just contains field names in your case). 每一行都是一个列表,因此您可以将每个字段设置为row[0]row[1]等。请注意,这将加载第一行(在您的情况下只包含字段名称)。

As we have field names given to us in the first row, we can use csv.DictReader so that fields in each row can be accessed as row['id'] , row['name'] , etc. This will also skip the first row for us: 由于我们在第一行中给出了我们的字段名称,我们可以使用csv.DictReader以便每行中的字段可以作为row['id']row['name']等进行访问。这也将跳过第一排为我们:

import csv
with open('my_csv.csv') as csv_file:
   for row in csv.DictReader(csv_file):
       # do work

For the JSON export, use the json module. 对于JSON导出,请使用json模块。 json.dumps() will take Python data structures such as lists and dictionaries and return the appropriate JSON string: json.dumps()将采用Python数据结构(如列表和字典)并返回相应的JSON字符串:

import json
my_data = {'id': 123, 'name': 'Test User', 'emails': ['test@example.com', 'test@hotmail.com']}
my_data_json = json.dumps(my_data)

If you want to generate JSON output exactly as you posted, you'd do something like: 如果要完全按照发布的方式生成JSON输出,则可以执行以下操作:

output = {'persons': []}
with open('my_csv.csv') as csv_file:
    for person in csv.DictReader(csv_file):
        output['persons'].append({
            'type': 'config.profile',
            'id': person['id'],
            # ...add other fields (email etc) here...
        })

        # ...do similar for config.pictures, config.status, etc...

output_json = json.dumps(output)

output_json will contain the JSON output that you want. output_json将包含您想要的JSON输出。

However, I'd suggest you carefully consider the structure of the JSON output that you're after - at the moment, you're defining an outer dictionary that serves no purpose, and you're adding all your ' config ' data directly under ' persons ' - you may want to reconsider this. 但是,我建议你仔细考虑你所追求的JSON输出的结构 - 目前,你正在定义一个没有用处的外部字典,而你正在直接添加所有' config '数据。 ' persons ' - 你可能想重新考虑这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM