繁体   English   中英

python json转csv, 如何分块或逐行读取文件

[英]python json to csv, How to read a file in chunks or line by line

我的代码,

import json
import pandas as pd

file_path = "F:\\2\\1.json"
with open(file_path, 'r',encoding='utf-8') as fh:
    file_data = fh.readlines()

all_data = []
for data in file_data:
    data = data.strip()
    if data:
        all_data.append(json.loads(data))
df = pd.json_normalize(all_data)

df.to_csv('F:\\2\\1.csv',encoding='utf-8',index=False)

读取大文件时提示“MemoryError”,如何修改我的代码?

我的 json 文件

{"_index":"core-bvd-dmc","_type":"_doc","_id":"e22762d5c4b81fbcad62b5c1d77226ec","_score":1,"_source":{"a_id":"P305906272","a_id_type":"Contact ID","a_name":"Mr Chuanzong Chen","a_name_normal":"MR CHUANZONG CHEN","a_job_title":"Executive director and general manager","relationship":"Currently works for (Executive director and general manager)","b_id":"CN9390051924","b_id_type":"BVD ID","b_name":"Yantai haofeng trade co., ltd.","b_name_normal":"YANTAI HAOFENG TRADE CO","b_country_code":"CN","b_country":"China","b_in_compliance_db":false,"b_nationality":"CN","b_street_address":"Bei da jie 53hao 1609shi; Zhi fu qu","b_city":"Yantai","b_postcode":"264000","b_region":"East China|Shandong","b_phone":"+86 18354522225","b_email":"18354522225@163.com","b_latitude":37.511873,"b_longitude":121.396883,"b_geo_accuracy":"Community","b_national_ids":{"Unified social credit code":["91370602073035263P"],"Trade register number":["370602200112047"],"NOC":["073035263"]},"dates":{"date_of_birth":null},"file_name":"/media/hedwig/iforce/data/BvD/s3-transfer/SuperTable_v3_json/dmc/part-00020-7b09c546-2adc-413e-9e68-18b300e205cf-c000.json","b_geo_point":{"lat":37.511873,"lon":121.396883}}}
{"_index":"core-bvd-dmc","_type":"_doc","_id":"97871f8842398794e380a748f5b82ea5","_score":1,"_source":{"a_id":"P305888975","a_id_type":"Contact ID","a_name":"Mr Hengchao Jiang","a_name_normal":"MR HENGCHAO JIANG","a_job_title":"Legal representative","relationship":"Currently works for (Legal representative)","b_id":"CN9390053357","b_id_type":"BVD ID","b_name":"Yantai ji hong educate request information co., ltd.","b_name_normal":"YANTAI JI HONG EDUCATE REQUEST INFORMATION CO","b_country_code":"CN","b_country":"China","b_in_compliance_db":false,"b_nationality":"CN","b_street_address":"Ying chun da jie 131hao nei 1hao; Lai shan qu","b_city":"Yantai","b_postcode":"264000","b_region":"East China|Shandong","b_phone":"+86 18694982966","b_email":"xyw_747@163.com","b_latitude":37.511873,"b_longitude":121.396883,"b_geo_accuracy":"Community","b_national_ids":{"NOC":["597807789"],"Trade register number":["370613200023836"],"Unified social credit code":["913706135978077898"]},"dates":{"date_of_birth":null},"file_name":"/media/hedwig/iforce/data/BvD/s3-transfer/SuperTable_v3_json/dmc/part-00020-7b09c546-2adc-413e-9e68-18b300e205cf-c000.json","b_geo_point":{"lat":37.511873,"lon":121.396883}}}

试试这个:

import pandas as pd
df = pd.read_json (r'F:\2\1.json')
df.to_csv (r'F:\2\1.csv', index = None)

To avoid running out of memory, you could append the output CSV for each input line of JSON. 这应该适用于任何大小的文件。 例如:

import pandas as pd
import json

add_header = True

with open('1.json') as f_json:
    for line in f_json:
        line = line.strip()
        
        if line:
            df = pd.json_normalize(json.loads(line))
            df.to_csv('1.csv', index=None, mode='a', header=add_header)
            add_header = False

在写入 output 之前读取和处理一组行可能会提高速度。 在这里它继续处理直到达到阈值:

import pandas as pd

add_header = True
df = pd.DataFrame()

with open('1.json') as f_json:
    for line in f_json:
        line = line.strip()
        
        if line:
            df_line = pd.read_json(line)
            df = pd.concat([df, df_line])
            
            if df.size > 10000:
                df.to_csv('1.csv', index=None, mode='a', header=add_header)
                add_header = False
                df = df.DataFrame()

if df.size:
    df.to_csv('1.csv', index=None, mode='a', header=add_header)  

注意:这也只是使用read_json()而不是json.loads()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM