簡體 English 中英

將50 GB的JSON處理到Pandas Dataframe中

[英]Processing 50 GB of JSON into Pandas Dataframe

原文 2017-07-27 20:25:07 7 1 python/ json/ pandas

我有大約50 GB的6,000個JSON文件，目前正在使用以下方法將其加載到pandas數據框中。 （ format_pandas函數在讀取每個JSON行時設置了我的熊貓數據框）：

path = '/Users/shabina.rayan/Desktop/Jupyter/Scandanavia Weather/Player  Data'
records = []
for filename in glob.glob(os.path.join(path, '*.JSON')):
    file = Path(filename)
    with open(file) as json_data:
        j = json.load(json_data)
        format_pandas(j)
pandas_json = json.dumps(records)
df = pd.read_json(pandas_json,orient="records")

可以猜到，這需要花費很長時間才能處理我的數據。 有人對我可以處理50 GB的JSON文件並進行可視化/分析的其他方式有任何建議嗎？

1 個解決方案

將其轉儲到Elasticsearch中並根據需要運行查詢。

加快~50GB CSV文件的光處理速度

[英]Speeding up the light processing of ~50GB CSV file

處理具有多個根元素的大型 JSON 並讀入 pandas dataframe

[英]Processing large JSON with multiple root elements and read into pandas dataframe

python-熊貓數據框處理

[英]python - pandas dataframe processing

在Pandas中更快地處理Dataframe

[英]Faster processing of Dataframe in Pandas

處理/轉置Pandas Dataframe

[英]Processing/Transposing Pandas Dataframe

處理熊貓數據框

[英]Processing of pandas dataframe

逐行讀取大型json（> 5gb）文件並處理每一行並使用Pandas創建DataFrame

[英]Read large json(>5gb) file by line by line and process each line and create DataFrame using Pandas

四舍五入到最接近 50 的 pandas 數據框

[英]Rounding off to the nearest 50's pandas dataframe

大熊貓Dataframe並行處理

[英]Large Pandas Dataframe parallel processing

熊貓數據幀處理非常慢

[英]Pandas Dataframe Processing is Very Slow

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 加快~50GB CSV文件的光處理速度處理具有多個根元素的大型 JSON 並讀入 pandas dataframe python-熊貓數據框處理在Pandas中更快地處理Dataframe 處理/轉置Pandas Dataframe 處理熊貓數據框逐行讀取大型json（> 5gb）文件並處理每一行並使用Pandas創建DataFrame 四舍五入到最接近 50 的 pandas 數據框大熊貓Dataframe並行處理熊貓數據幀處理非常慢

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM