[英]Python how can I read huge binary file(>25GB)?
I have N-body simulation data and have to read that file in python. 我有N体仿真数据,必须在python中读取该文件。
Its size is over 25GB so file.read() is not work by lack of memory. 它的大小超过25GB,因此由于内存不足,file.read()无法正常工作。
So I wrote the code like this 所以我写了这样的代码
with open("fullFoF_merger.cbin.z0.Run1", "rb") as mergertree:
def param(data):
result = {"nowhid":data[0], "nexthid":data[2],"zi":data[10],
"zip1":data[11], "F":data[4], "mass":data[9],
"dlnM":data[5],"dM":data[12], "dlnJ":data[6],"dJ":data[13],
"dlnspin": data[7], "spin":data[8],
"G":data[14], "overden":data[15]}
return result
num = 0
while 1:
num +=1
binary_data = mergertree.read(4)
if not binary_data : break
n_max = struct.unpack('I', binary_data)
binary_data = mergertree.read(64*n_max[0])
Halo = [None]*n_max[0]
for i in range(1,n_max[0]+1):
data = struct.unpack("4i12f", binary_data[64*(i-1):64*(i)])
Halo[i-1] = param(data)
MergerQ = []+Halo
print(MergerQ)
print(num)
print("\n Run time \n --- %d seconds ---" %(time.time()-start_time))
In this process while loop calculate 45470522 times in this code. 在此过程中,while循环在此代码中计算了45470522次。 But when I print MergerQ in python it shows only one dictionary data like this
但是当我在python中打印MergerQ时,它仅显示一个这样的字典数据
[{'nowhid': 53724, 'nexthid': 21912952, 'zi': 0.019874930381774902, 'zip1': -1.6510486602783203e-05, 'F': inf, 'mass': 67336740864.0, 'dlnM': 0.0, 'dM': 0.0, 'dlnJ': 0.1983184665441513, 'dJ': 8463334768640.0, 'dlnspin': 0.19668935239315033, 'spin': 0.012752866372466087, 'G': inf, 'overden': 1.0068886280059814}]
I think it caused by lack of memory or memory limit of python's variables. 我认为这是由于内存不足或python变量的内存限制引起的。
How can I solve this problem? 我怎么解决这个问题?
Is there any way to read whole data and save in python variables? 有什么办法读取整个数据并保存在python变量中?
Parallel computing can be the solution of this code? 并行计算可以解决此代码吗?
I will waiting for your comment. 我将等待您的评论。 Thank you.
谢谢。
This line is your problem: 这行是你的问题:
MergerQ = []+Halo
You clear MergerQ
, put it outside of your loop instead: 您清除
MergerQ
,然后将其放在循环之外:
num = 0
MergerQ = []
while 1:
...
MergerQ += Halo
But don't expect to have the amount of memory you need to store the entire thing if your file is that big, you'll need a lot of memory and a lot of time. 但是如果文件那么大,不要指望拥有存储整个内容所需的内存量,您将需要大量的内存和大量的时间。
Edit 编辑
Its very possible that you'll be able to successfully run your code without as much physical RAM would be needed as your OS will likely store it in your hard disk fetching it when needed, but this will massively increase run time. 由于操作系统可能会将代码存储在硬盘中并在需要时提取代码,因此很有可能无需大量物理RAM就可以成功运行代码,但这将大大增加运行时间。
Try running this code snippet and seeing what happens ( forewarning: if you leave this running too long your machine will become unresponsive and most likely need physically reset ) 尝试运行此代码段,看看会发生什么( 警告:如果将此运行时间过长,您的计算机将变得无响应,很可能需要物理重置 )
a = []
while 1:
a = [a, a]
Expect your script to react similarly. 期望您的脚本做出类似反应。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.