如何使用python代碼從二進制文件中讀取和提取值？

Question

我對python比較陌生。 作為我的天文學項目工作的一部分，我必須處理二進制文件（當然這對我來說也是新的）。 我得到了一個二進制文件和一個從二進制文件中讀取數據的python代碼。 然后我的教授要求我理解代碼如何在二進制文件上運行。 我花了幾天時間想弄明白，但沒有任何幫助。 這里的任何人都可以幫我代碼嗎？

# Read the binary opacity file
f = open(file, "r")

# read file dimension sizes
a = np.fromfile(f, dtype=np.int32, count=16)
NX, NY, NZ = a[1], a[4], a[7]


# read the time and time step
time, time_step = np.fromfile(f, dtype=np.float64, count=2)

# number of iterations
nite = np.fromfile(f, dtype=np.int32, count=1)

# radius array
trash = np.fromfile(f, dtype=np.float64, count=1)
rad = np.fromfile(f, dtype=np.float64, count=a[1])

# phi array
trash = np.fromfile(f, dtype=np.float64, count=1)
phi = np.fromfile(f, dtype=np.float64, count=a[4])

# close the file
f.close()

據我所知，二進制文件包含幾個參數（例如：半徑，phi，聲速，輻射能量）及其多個值。 上面的代碼從二進制文件中提取值2 parameters- radius和phi。 半徑和phi都有超過100個值。 該程序有效，但我無法理解它是如何工作的。 任何幫助，將不勝感激。

Answer 1

二進制文件基本上只是一長串連續數據; 你需要告訴np.fromfile（）在哪里查看以及期望什么類型的數據 。 如果你創建自己的文件，也許最容易理解：

import numpy as np

with open('numpy_testfile', 'w+') as f:
    ## we create a "header" line, which collects the lengths of all relevant arrays
    ## you can then use this header line to tell np.fromfile() *how long* the arrays are
    dimensions=np.array([0,10,0,0,10,0,3,10],dtype=np.int32)
    dimensions.tofile(f) ## write to file

    a=np.arange(0,10,1) ## some fake data, length 10
    a.tofile(f) ## write to file
    print(a.dtype)

    b=np.arange(30,40,1) ## more fake data, length 10
    b.tofile(f) ## write to file
    print(b.dtype)

    ##  more interesting data, this time it's of type float, length 3
    c=np.array([3.14,4.22,55.0],dtype=np.float64) 
    c.tofile(f) ## write to file
    print(c.dtype)

    a.tofile(f) ## just for fun, let's write "a" again

with open('numpy_testfile', 'r+b') as f:
    ### what's important to know about this step is that 
    #   numpy is "seeking" the file automatically, i.e. it is considering 
    #   the first count=8, than the next count=10, and so on 
    #   as "continuous data"
    dim=np.fromfile(f,dtype=np.int32,count=8)
    print(dim) ## our header line: [ 0 10  0  0 10  0  3 10]
    a=np.fromfile(f,dtype=np.int64,count=dim[1])## read the dim[1]=10 numbers
    b=np.fromfile(f,dtype=np.int64,count=dim[4])## and the next 10
    ## now it's dim[6]=3, and the dtype is float 10
    c=np.fromfile(f,dtype=np.float64,count=dim[6] )#count=30)
    ## read "the rest", unspecified length, let's hope it's all int64 actually!
    d=np.fromfile(f,dtype=np.int64) 

print(a)
print(b)
print(c)
print(d)

附錄：在阻止使用 np.tofile()和np.fromfile()時， numpy文檔是非常明確的：

不要依賴tofile和fromfile的組合來進行數據存儲，因為生成的二進制文件不是獨立於平台的。 特別是，不保存字節順序或數據類型信息。 數據可以使用保存和加載以獨立於平台的.npy格式存儲。

個人注意事項：如果你花了幾天時間來理解這段代碼，不要因為學習python而氣餒; 我們都從某個地方開始。 我建議誠實地告訴你教授遇到的障礙（如果出現在談話中），因為她/他應該能夠在編程時正確地斷言“你在哪里”。 :-)

Answer 2

from astropy.io import ascii    
data = ascii.read('/directory/filename')
column1data = data[nameofcolumn1]
column2data = data[nameofcolumn2]

等。 column1data現在是該標題下所有值的數組我使用此方法導入ASCII格式的SourceExtractor數據文件。 我相信這是一種從ascii文件導入數據的更優雅方式。

如何使用python代碼從二進制文件中讀取和提取值？

問題描述

2 個解決方案

解決方案1
0 2019-05-19 08:08:26

解決方案2
0 2019-07-05 19:02:48

如何使用python代碼從二進制文件中讀取和提取值？

問題描述

2 個解決方案

解決方案1 0 2019-05-19 08:08:26

解決方案2 0 2019-07-05 19:02:48

解決方案1
0 2019-05-19 08:08:26

解決方案2
0 2019-07-05 19:02:48