将 json 文件读入 pandas dataframe 非常慢

Question

I have a json file of size less than 1Gb.I am trying to read the file on a server that have 400 Gb RAM using the following simple command:我有一个大小小于 1Gb 的 json 文件。我正在尝试使用以下简单命令读取具有 400 Gb RAM 的服务器上的文件：

df = pd.read_json('filepath.json')

However this code is taking forever (several hours) to execute,I tried several suggestions such as然而，这段代码需要永远（几个小时）才能执行，我尝试了几个建议，例如

df = pd.read_json('filepath.json', low_memory=False)

or或者

df = pd.read_json('filepath.json', lines=True)

But none have worked.但没有一个奏效。 How come reading 1GB file into a server of 400GB be so slow?为什么将 1GB 的文件读入 400GB 的服务器会这么慢？

Answer 1

You can use Chunking can shrink memory use.可以用Chunking 可以shrink memory 使用。 I recommend Dask Library can load data in parallel.我推荐 Dask Library 可以并行加载数据。

将 json 文件读入 pandas dataframe 非常慢

问题描述

1 个解决方案

解决方案1
1 2022-02-24 14:09:06

将 json 文件读入 pandas dataframe 非常慢

问题描述

1 个解决方案

解决方案1 1 2022-02-24 14:09:06

解决方案1
1 2022-02-24 14:09:06