简体   繁体   English

Python 解析 JSON 并将 output 写入 ZCC8D68C551C4A9A9A6D5313DZDE7

[英]Python parse JSON and write output to CSV

I am trying to parse a json file which contains the following.我正在尝试解析包含以下内容的 json 文件。

{"short_desc":
{"3641":[{"when":1002747507,"what":"DCR: Cant compare from outliner (1GDHJKK)","who":14},{"when":1002771621,"what":"DCR: Can't compare from outliner (1GDHJKK)","who":21}],
"3470":[{"when":1002747341,"what":"Can't compare code editions in type hierarchy view (1GGNI4W)","who":24},{"when":1002771649,"what":"DCR: Can't compare code editions in type hierarchy view (1GGNI4W)","who":21}]

I try to take the text that has the what header and every number in front of each array and save it into csv.我尝试将包含header和每个数组前面的每个数字的文本保存到 csv 中。 The expected results are as follows.预期结果如下。

id  | description
3641 | DCR: Cant compare from outliner (1GDHJKK)
3641 | DCR: Can't compare from outliner (1GDHJKK)
3470 | Can't compare code editions in type hierarchy view (1GGNI4W)
3470 | DCR: Can't compare code editions in type hierarchy view (1GGNI4W)

I tried the following code and tried only to get the value of what , but got an error KeyError: 'what'我尝试了以下代码并仅尝试获取what的值,但出现错误KeyError: 'what'

import csv
import json
from glob import glob

# Open CSV output files for reading and writing
input_dir = ""
output_dir = ""

# Open main twitter data CSV file and write header row
output_file = output_dir + "coba.csv"
f_out = open(output_file, 'w', encoding='utf-8')
rowwriter = csv.writer(f_out, delimiter=',', lineterminator='\n')
outputrow = ['description']
rowwriter.writerow(outputrow)

# Define variables
inc = 0

with open('msr2013/data/v02/eclipse/short_desc.json', 'r', encoding='utf-8') as f:
    for line in f:
        bug = json.loads(line)

        # Set standard variables equal to tweet data
        bug = bug['short_desc']['what']          

        # Write to main output file
        outputrow = [bug] 
        rowwriter.writerow(outputrow)

        inc += 1
        # Optional counter increments variables to track progress, useful for very large files.
        if inc%10000 == 0:
            print(inc)

# Close the output file
f_out.close()

Can anyone give me a solution?谁能给我一个解决方案?

Based on the sample data, you should do like the following.根据示例数据,您应该执行以下操作。

        bug = json.loads(line)

        # Set standard variables equal to tweet data
        for _id in bug['short_desc']:
            for i in range(len(bug['short_desc'][_id])):
                bug_what = bug['short_desc'][_id][i]['what']

                # Write to main output file
                outputrow = [bug_what]

bug['short_desc'] is an object mapping id to an array. bug['short_desc']是一个 object 映射id到一个数组。

you can use pandas to parse json, apply changes and write output to csv.您可以使用 pandas 解析 json,应用更改并将 output 写入 Z628CB5675FF524F3E719B7AA2E883F

import pandas as pd
data = pd.read_json('test.json')
data['id'] = data.index
filter_data = data[data['short_desc'].notnull()].to_dict(orient='records')
inner_data = pd.json_normalize(filter_data, record_path='short_desc', meta=['id'],errors='ignore')
inner_data = inner_data[['what','id']]
inner_data = inner_data.rename(columns={'what':'description'})
inner_data.to_csv('test.csv')

output: output:

在此处输入图像描述

Based on your data sample you don't have " what " key in " short_description ".根据您的数据样本,您在“ short_description ”中没有“ what ”键。 数据样本格式

You should do something like this:你应该这样做:

...
for line in f:
    bug = json.loads(line)
    for bug_id, bug_data in bug.get('short_desc', {}).items():
        #For first sample: bug_id='3470' and bug_data=[{...}, {...}] 
        for row in bug_data:
            rowwriter.writerow([bug_id, row['what']])
...  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM