如何通过每一行处理大的json文件并有效地转换为csv？

Question

I have a big file with json string in each row and I want to select few attributes and save those as csv. 我每行都有一个带有json字符串的大文件，我想选择一些属性并将其另存为csv。 I have the following code for that. 我有以下代码。 There is around 2 million rows and I want to extract part of them, up to 1 million. 大约有200万行，我想提取其中的一部分，最多100万行。 I know that better solution would be to store jsons not in one file, so maybe there is a way to split it first? 我知道更好的解决方案是不将json存储在一个文件中，所以也许有一种方法可以先将其拆分？

Anyway, having code like this reaches memory limit when trying to make a dataframe at once. 无论如何，当试图立即制作一个数据帧时，具有这样的代码会达到内存限制。 Could you suggest the most appropriate solution, please? 您能提出最合适的解决方案吗？

import json
import pandas as pd
from itertools import islice
from pandas.io.json import json_normalize

data = []

with open("input.json") as f:
    head = list(islice(f, 0, 100000))

    for line in head:
        if 'cat' in line:
            data.append(json.loads(line))

#creating a dataframe 
df = json_normalize(data, errors='ignore')
df = df[['col1','col2','col3']]
df.to_csv("output.csv", header=True, sep=';')

Answer 1

why you are keeping head in the memory. 为什么要保持head在内存中。 You can directly use the loop and stop when you reach 1 million. 您可以直接使用循环并在达到一百万时停止。

如何通过每一行处理大的json文件并有效地转换为csv？

问题描述

1 个解决方案

解决方案1
0 2018-06-07 13:05:09

如何通过每一行处理大的json文件并有效地转换为csv？

问题描述

1 个解决方案

解决方案1 0 2018-06-07 13:05:09

解决方案1
0 2018-06-07 13:05:09