简体   繁体   English

在python中读取二进制文件

[英]reading a binary file in python

I have to read a binary file in python. 我必须在python中读取二进制文件。 This is first written by a Fortran 90 program in this way: 这首先是由Fortran 90程序以这种方式编写的:

open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)

I can easily read this binary file with the following IDL code: 我可以使用以下IDL代码轻松读取此二进制文件:

n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu  =dblarr(n1,n2)
n   =dblarr(n1)
T   =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1

What I want to do is reading this binary file with Python. 我想要做的是用Python读取这个二进制文件。 But there are some problems. 但是有一些问题。 First of all, here is my attempt to read the file: 首先,这是我尝试阅读该文件:

import numpy
from numpy import *
import struct

file='name_of_my_file'
with open(file,mode='rb') as lines:
    c=lines.read()

I try to read the first two variables: 我尝试阅读前两个变量:

dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])

But as you can see I had to add to dummy variables because, somehow, the fortran programs add the integer 8 in those positions. 但是你可以看到我必须添加到虚拟变量,因为不知何故,fortran程序在这些位置添加整数8。

The problem is now when trying to read the other bytes. 现在问题是在尝试读取其他字节时。 I don't get the same result of the IDL program. 我没有得到相同的IDL程序结果。

Here is my attempt to read the array n 这是我尝试读取数组n

 double = 8
 end = 16+n1*double
 nH = struct.unpack('d'*n1,c[16:end])

However, when I print this array I get non sense value. 但是,当我打印这个数组时,我得到了无意义的值。 I mean, I can read the file with the above IDL code, so I know what to expect. 我的意思是,我可以用上面的IDL代码读取文件,所以我知道会发生什么。 So my question is: how can I read this file when I don't know exactly the structure? 所以我的问题是:当我不确切知道结构时,我怎么能读到这个文件? Why with IDL it is so simple to read it? 为什么使用IDL它是如此简单易读? I need to read this data set with Python. 我需要用Python读取这个数据集。

What you're looking for is the struct module. 您正在寻找的是struct模块。

This module allows you to unpack data from strings, treating it like binary data. 此模块允许您从字符串中解压缩数据,将其视为二进制数据。

You supply a format string, and your file string, and it will consume the data returning you binary objects. 您提供格式字符串和文件字符串,它将使用返回二进制对象的数据。

For example, using your variables: 例如,使用您的变量:

import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
                   #but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])

This will make n, T, Teq, and cool hold the first four doubles in your binary file. 这将使n,T,Teq和cool保存二进制文件中的前四个双精度数。 Of course, this is just a demonstration. 当然,这只是一个示范。 Your example looks like it wants lists of doubles - conveniently struct.unpack returns a tuple, which I take for your case will still work fine (if not, you can listify them). 你的例子看起来像是想要双打的列表 - 方便的是struct.unpack返回一个元组,我为你的情况采取的仍然可以正常工作(如果没有,你可以使它们分层)。 Keep in mind that struct.unpack needs to consume the whole string passed into it - otherwise you'll get a struct.error . 请记住, struct.unpack需要消耗传递给它的整个字符串 - 否则你将获得struct.error So, either slice your input string, or only read the number of characters you'll use, like I said above in my comment. 所以,要么切片你的输入字符串,要么只read你将使用的字符数,就像我在评论中所说的那样。

For example, 例如,

n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)

It looks like you are trying to read the cooling_0000x.out file generated by RAMSES. 看起来您正在尝试读取RAMSES生成的cooling_0000x.out文件。

Note that the first two integers (n1, n2) provide the dimensions of the two dimentional tables (arrays) that follow in the body of the file... So you need to first process those two integers before you know how much real*8 data is in the rest of the file. 请注意,前两个整数(n1,n2)提供了文件正文中后面的两个维数表(数组)的维度...所以你需要先知道这两个整数,然后再知道真正的* 8数据位于文件的其余部分。

scipy should be of help -- it lets you read arbitrary dimensioned binary data: scipy应该有帮助 - 它可以让你读取任意尺寸的二进制数据:

http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1 http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1

If you already have this python code, please let me know as I was going to write it today (17Sep2014). 如果您已经有这个python代码,请告诉我,因为我今天要写它(17Sep2014)。

Rick 干草堆

Did you give scipy.io.readsav a try? 试过scipy.io.readsav了吗?

Simply read you file like this: 简单地读你这样的文件:

mydict = scipy.io.readsav('name_of_file')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM