简体   繁体   English

Jupyter ipython kernel 在大文件加载时死机

[英]Jupyter ipython kernel dies on large file loading

I have a huge binary file of size ~10gbs which I want to load into a pandas dataframe on my Jupyter notebook.我有一个大小约为 10gbs 的巨大二进制文件,我想将其加载到 Jupyter 笔记本上的 pandas dataframe 中。 I am using the following code for creating the dataframe:我正在使用以下代码创建 dataframe:

df = pd.DataFrame(np.fromfile('binary_file.dat', dtype = mydtype)) #the file has over 20 columns of dtype '<f8'

Everytime I run this command, my kernel dies.每次我运行这个命令时,我的 kernel 都会死掉。 On debugging, I found that the np.fromfile command goes through but pd.dataframe command is the one which causes the crash.在调试时,我发现 np.fromfile 命令通过但 pd.dataframe 命令是导致崩溃的命令。 I am running this on a 4 core, 16 GB Ubuntu AWS server.我在 4 核、16 GB Ubuntu AWS 服务器上运行它。 I tried setting我试过设置

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True' os.environ['KMP_DUPLICATE_LIB_OK'] = '真'

as per a Stackoverflow answer, but it didn't help.根据 Stackoverflow 的回答,但它没有帮助。 How can I read this file without crashing the kernel?如何在不使 kernel 崩溃的情况下读取此文件? Is it possible to do without increasing the server RAM?是否可以在不增加服务器 RAM 的情况下做到这一点?

Any and all assistance is appreciated.感谢您提供任何和所有帮助。 Thank you谢谢

Try尝试

df = pd.read_csv('.....\binary_file.dat' , sep="however you dat-file is separated",engine ='python') 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM