[英]Create nested JSON from CSV
I already read Create nested JSON from flat csv , but it didn't help in my case. 我已经从flat csv中读过Create nested JSON ,但在我的情况下它并没有帮助。
I have quite a big spreadsheet created with Google Docs consisting of 11 rows and 74 columns (some columns are not occupied). 我有一个很大的电子表格,使用包含11行和74列的Google Docs创建(某些列未被占用)。
I created an example on Google Drive . 我在Google云端硬盘上创建了一个示例。 When exported as a
CSV
it looks like this: 导出为
CSV
它看起来像这样:
id,name,email,phone,picture01,picture02,picture03,status
1,Alice,alice@gmail.com,2131232,"image01_01
[this is an image]",image01_02,image01_03,single
2,Bob,bob@gmail.com,2854839,image02_01,"image02_02
[description to image 2]",,married
3,Frank,frank@gmail.com,987987,image03_01,image03_02,,single
4,Shawn,shawn@gmail.com,,image04_01,,,single
Now I would like to have a JSON
structure, which looks like this: 现在我想要一个
JSON
结构,如下所示:
{
"persons": [
{
"type": "config.profile",
"id": "1",
"email": "alice@gmail.com",
"pictureId": "p01",
"statusId": "s01"
},
{
"type": "config.pictures",
"id": "p01",
"album": [
{
"image": "image01_01",
"description": "this is an image"
},
{
"image": "image_01_02",
"description": ""
},
{
"image": "image_01_03",
"description": ""
}
]
},
{
"type": "config.status",
"id": "s01",
"status": "single"
},
{
"type": "config.profile",
"id": "2",
"email": "bob@gmail.com",
"pictureId": "p02",
"statusId": "s02"
},
{
"type": "config.pictures",
"id": "p02",
"album": [
{
"image": "image02_01",
"description": ""
},
{
"image": "image_02_02",
"description": "description to image 2"
}
]
},
{
"type": "config.status",
"id": "s02",
"status": "married"
}
]
}
And so on for the other lines. 等等其他线路。
My theoretical approach would be to go through the CSV
file per row (here starts the first problem: now every row is equal to one line, but sometimes several, thus I need to count the commas?). 我的理论方法是每行检查一次
CSV
文件(这里开始第一个问题:现在每行等于一行,但有时几行,因此我需要计算逗号?)。 Each row is equal to a block of config.profile
, including the id
, email
, pictureId
, and statusId
(the latter two are being generated depending on the row number). 每行等于
config.profile
块,包括id
, email
, pictureId
和statusId
(后两个是根据行号生成的)。
Then for each row a config.pictures
block is generated with the same id
as the one inserted in the config.profile
block. 然后,对于每一行,生成一个
config.pictures
块,其id
与config.profile
块中插入的id
相同。 The album
is an array of as many elements as pictures are given. album
是一系列元素,与图片一样多。
Lastly each row has a config.status
block, which, again, has the same id
as the one given in config.profile
, and one entry of status
with the corresponding status. 最后,每一行都有一个
config.status
块,它同样具有与config.profile
给出的id
相同的id
,以及一个具有相应状态的status
条目。
I'm entirely clueless how to create the nested and conditional JSON file. 我完全不知道如何创建嵌套和条件JSON文件。
I just got to the point where I convert the CSV
to valid JSON
, without any nesting and additional info, which are not directly given in the CSV
, like the type
, pictureId
, statusId
, and so on. 我刚刚达到了将
CSV
转换为有效JSON
,没有任何嵌套和其他信息,这些信息在CSV
中没有直接给出,如type
, pictureId
, statusId
等。
Any help is appreciated. 任何帮助表示赞赏。 If it is easier to program this in another script language (like
ruby
), I would gladly switch to those). 如果用另一种脚本语言(如
ruby
)更容易编程,我很乐意切换到那些)。
Before someone thinks this is a homework or whatnot. 在有人认为这是家庭作业或诸如此类的东西之前。 It is not.
它不是。 I just want to automate an otherwise very tiresome copy&paste task.
我只是想自动化一个非常烦人的复制和粘贴任务。
The csv
module will handle the CSV reading nicely - including handling line breaks that are within quotes. csv
模块将很好地处理CSV读取 - 包括处理引号内的换行符。
import csv
with open('my_csv.csv') as csv_file:
for row in csv.reader(csv_file):
# do work
The csv.reader
object is an iterator - you can iterate through the rows in the CSV by using a for
loop. csv.reader
对象是一个迭代器 - 您可以使用for
循环遍历CSV中的行。 Each row is a list, so you can get each field as row[0]
, row[1]
, etc. Be aware that this will load the first row (which just contains field names in your case). 每一行都是一个列表,因此您可以将每个字段设置为
row[0]
, row[1]
等。请注意,这将加载第一行(在您的情况下只包含字段名称)。
As we have field names given to us in the first row, we can use csv.DictReader
so that fields in each row can be accessed as row['id']
, row['name']
, etc. This will also skip the first row for us: 由于我们在第一行中给出了我们的字段名称,我们可以使用
csv.DictReader
以便每行中的字段可以作为row['id']
, row['name']
等进行访问。这也将跳过第一排为我们:
import csv
with open('my_csv.csv') as csv_file:
for row in csv.DictReader(csv_file):
# do work
For the JSON export, use the json
module. 对于JSON导出,请使用
json
模块。 json.dumps()
will take Python data structures such as lists and dictionaries and return the appropriate JSON string: json.dumps()
将采用Python数据结构(如列表和字典)并返回相应的JSON字符串:
import json
my_data = {'id': 123, 'name': 'Test User', 'emails': ['test@example.com', 'test@hotmail.com']}
my_data_json = json.dumps(my_data)
If you want to generate JSON output exactly as you posted, you'd do something like: 如果要完全按照发布的方式生成JSON输出,则可以执行以下操作:
output = {'persons': []}
with open('my_csv.csv') as csv_file:
for person in csv.DictReader(csv_file):
output['persons'].append({
'type': 'config.profile',
'id': person['id'],
# ...add other fields (email etc) here...
})
# ...do similar for config.pictures, config.status, etc...
output_json = json.dumps(output)
output_json
will contain the JSON output that you want. output_json
将包含您想要的JSON输出。
However, I'd suggest you carefully consider the structure of the JSON output that you're after - at the moment, you're defining an outer dictionary that serves no purpose, and you're adding all your ' config
' data directly under ' persons
' - you may want to reconsider this. 但是,我建议你仔细考虑你所追求的JSON输出的结构 - 目前,你正在定义一个没有用处的外部字典,而你正在直接添加所有'
config
'数据。 ' persons
' - 你可能想重新考虑这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.