将非嵌套 json 转换为 csv 文件？

Question

我正在处理一个非嵌套的 json 文件，数据来自 reddit。 我正在尝试使用 python 将其转换为 csv 文件。 每行都没有相同的字段，因此不断收到以下错误：

JSONDecodeError: Extra data: line 2 column 1

这是代码：

import csv
import json
import os

os.chdir('c:\\Users\\Desktop')
infile = open("data.json", "r")
outfile = open("outputfile.csv", "w")

writer = csv.writer(outfile)

for row in json.loads(infile.read()):
    writer.writerow(row)

以下是数据中的几行：

{"author":"i_had_an_apostrophe","body":"\"It's not your fault.\"","author_flair_css_class":null,"link_id":"t3_5c0rn0","subreddit":"AskReddit","created_utc":1478736000,"subreddit_id":"t5_2qh1i","parent_id":"t1_d9t3q4d","author_flair_text":null,"id":"d9tlp0j"}
{"id":"d9tlp0k","author_flair_text":null,"parent_id":"t1_d9tame6","link_id":"t3_5c1efx","subreddit":"technology","created_utc":1478736000,"subreddit_id":"t5_2qh16","author":"willliam971","body":"9/11 inside job??","author_flair_css_class":null}
{"created_utc":1478736000,"subreddit_id":"t5_2qur2","link_id":"t3_5c44bz","subreddit":"excel","author":"excelevator","author_flair_css_class":"points","body":"Have you tried stepping through the code to analyse the values at each step?\n\n","author_flair_text":"442","id":"d9tlp0l","parent_id":"t3_5c44bz"}
{"created_utc":1478736000,"subreddit_id":"t5_2tycb","link_id":"t3_5c384j","subreddit":"OldSchoolCool","author":"10minutes_late","author_flair_css_class":null,"body":"**Thanks Hillary**","author_flair_text":null,"id":"d9tlp0m","parent_id":"t3_5c384j"}

我正在考虑获取 csv 文件中可用的所有字段（作为标题），如果该特定字段的数据不可用，只需用 NA 填充它。

Answer 1

您的问题缺少有关您要完成的工作的信息，因此我在猜测它们。 请注意，csv 文件不使用“空值”来表示缺失的字段，它们只是有分隔符，它们之间没有任何内容，例如1,2,,4,5没有第三个字段值。

此外，您打开 csv 文件的方式取决于您使用的是 Python 2 还是 Python 3。下面的代码适用于 Python 3。

#!/usr/bin/env python3
import csv
import json
import os

os.chdir('c:\\Users\\Desktop')
with open('sampledata.json', 'r', newline='') as infile:
    data = json.loads(infile.read())

# determine all the keys present, which will each become csv fields
fields = list(set(key for row in data for key in row))

with open('outputfile.csv', 'w', newline='') as outfile:
    writer = csv.DictWriter(outfile, fields)
    writer.writeheader()
    writer.writerows(row for row in data)

Answer 2

您可以编写一个小函数来为您构建行，仅在可用时提取数据，如果不可用则插入 None。 你称之为标题，我称之为模式。 获取所有字段，删除重复项并进行排序，然后根据完整的字段集构建记录并将这些记录插入到 csv 中。

import csv
import json

def build_record(row, schema):
    values = []
    for field in schema:
        if field in row:
            values.append(row[field])
        else:
            values.append(None)
    return tuple(values)

infile = open("data.json", "r").readlines()
outfile = open("outputfile.csv", "wb")
writer = csv.writer(outfile)

rows = [json.loads(row.strip()) for row in infile]
schema = tuple(sorted(list(set([k for r in rows for k in r.keys()]))))
records = [build_record(r, schema) for r in rows]

writer.writerow(schema)

for rec in records:
    writer.writerow(rec)
outfile.close()

Answer 3

您可以使用Pandas为您填空（您可能需要先pip install pandas ）：

import pandas as pd
import os

# load json
os.chdir('c:\\Users\\Desktop')
with open("data.json", "r") as infile:

    # read data into a Pandas DataFrame
    df = pd.read_json(infile)

# use Pandas to write to CSV
df.to_csv("myfile.csv")

Answer 4

我建议你使用csv.DictWriter类。 该类需要一个要写入的文件和一个字段名列表（我从您的数据样本中发现）。

import csv
import json
import os

fieldnames = [
    "author", "author_flair_css_class", "author_flair_text", "body",
    "created_utc", "id", "link_id", "parent_id", "subreddit",
    "subreddit_id"
]

os.chdir('c:\\Users\\Desktop')
with open("data.json", "r") as infile:
    outfile = open("outputfile.csv", "w")

    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()

    for row in infile:
        row_dict = json.loads(row)
        writer.writerow(row_dict)

    outfile.close()

将非嵌套 json 转换为 csv 文件？

问题描述

4 个解决方案

解决方案1
1 2017-01-27 02:14:13

解决方案2
0 2017-01-27 01:27:15

解决方案3
0 2017-01-27 01:37:06

解决方案4
0 已采纳 2017-01-27 02:18:04

将非嵌套 json 转换为 csv 文件？

问题描述

4 个解决方案

解决方案1 1 2017-01-27 02:14:13

解决方案2 0 2017-01-27 01:27:15

解决方案3 0 2017-01-27 01:37:06

解决方案4 0 已采纳 2017-01-27 02:18:04

解决方案1
1 2017-01-27 02:14:13

解决方案2
0 2017-01-27 01:27:15

解决方案3
0 2017-01-27 01:37:06

解决方案4
0 已采纳 2017-01-27 02:18:04