简体   繁体   English

如何通过每一行处理大的json文件并有效地转换为csv?

[英]How to process big json file by each line and convert to csv efficiently?

I have a big file with json string in each row and I want to select few attributes and save those as csv. 我每行都有一个带有json字符串的大文件,我想选择一些属性并将其另存为csv。 I have the following code for that. 我有以下代码。 There is around 2 million rows and I want to extract part of them, up to 1 million. 大约有200万行,我想提取其中的一部分,最多100万行。 I know that better solution would be to store jsons not in one file, so maybe there is a way to split it first? 我知道更好的解决方案是不将json存储在一个文件中,所以也许有一种方法可以先将其拆分?

Anyway, having code like this reaches memory limit when trying to make a dataframe at once. 无论如何,当试图立即制作一个数据帧时,具有这样的代码会达到内存限制。 Could you suggest the most appropriate solution, please? 您能提出最合适的解决方案吗?

import json
import pandas as pd
from itertools import islice
from pandas.io.json import json_normalize

data = []

with open("input.json") as f:
    head = list(islice(f, 0, 100000))

    for line in head:
        if 'cat' in line:
            data.append(json.loads(line))

#creating a dataframe 
df = json_normalize(data, errors='ignore')
df = df[['col1','col2','col3']]
df.to_csv("output.csv", header=True, sep=';')

why you are keeping head in the memory. 为什么要保持head在内存中。 You can directly use the loop and stop when you reach 1 million. 您可以直接使用循环并在达到一百万时停止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM