[英]Reading json file into pandas dataframe is very slow
I have a json file of size less than 1Gb.I am trying to read the file on a server that have 400 Gb RAM using the following simple command:我有一个大小小于 1Gb 的 json 文件。我正在尝试使用以下简单命令读取具有 400 Gb RAM 的服务器上的文件:
df = pd.read_json('filepath.json')
However this code is taking forever (several hours) to execute,I tried several suggestions such as然而,这段代码需要永远(几个小时)才能执行,我尝试了几个建议,例如
df = pd.read_json('filepath.json', low_memory=False)
or或者
df = pd.read_json('filepath.json', lines=True)
But none have worked.但没有一个奏效。 How come reading 1GB file into a server of 400GB be so slow?
为什么将 1GB 的文件读入 400GB 的服务器会这么慢?
You can use Chunking can shrink memory use.可以用Chunking 可以shrink memory 使用。 I recommend Dask Library can load data in parallel.
我推荐 Dask Library 可以并行加载数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.