Jupyter ipython kernel 在大文件加载时死机

Question

I have a huge binary file of size ~10gbs which I want to load into a pandas dataframe on my Jupyter notebook.我有一个大小约为 10gbs 的巨大二进制文件，我想将其加载到 Jupyter 笔记本上的 pandas dataframe 中。 I am using the following code for creating the dataframe:我正在使用以下代码创建 dataframe：

df = pd.DataFrame(np.fromfile('binary_file.dat', dtype = mydtype)) #the file has over 20 columns of dtype '<f8'

Everytime I run this command, my kernel dies.每次我运行这个命令时，我的 kernel 都会死掉。 On debugging, I found that the np.fromfile command goes through but pd.dataframe command is the one which causes the crash.在调试时，我发现 np.fromfile 命令通过但 pd.dataframe 命令是导致崩溃的命令。 I am running this on a 4 core, 16 GB Ubuntu AWS server.我在 4 核、16 GB Ubuntu AWS 服务器上运行它。 I tried setting我试过设置

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True' os.environ['KMP_DUPLICATE_LIB_OK'] = '真'

as per a Stackoverflow answer, but it didn't help.根据 Stackoverflow 的回答，但它没有帮助。 How can I read this file without crashing the kernel?如何在不使 kernel 崩溃的情况下读取此文件？ Is it possible to do without increasing the server RAM?是否可以在不增加服务器 RAM 的情况下做到这一点？

Any and all assistance is appreciated.感谢您提供任何和所有帮助。 Thank you谢谢

Answer 1

Try尝试

df = pd.read_csv('.....\binary_file.dat' , sep="however you dat-file is separated",engine ='python')

Jupyter ipython kernel 在大文件加载时死机

问题描述

1 个解决方案

解决方案1
0 2021-03-07 09:26:47

Jupyter ipython kernel 在大文件加载时死机

问题描述

1 个解决方案

解决方案1 0 2021-03-07 09:26:47

解决方案1
0 2021-03-07 09:26:47