使用 python/grep 從 Windows 和 Linux 上的二進制文件中提取字符串

Question

我需要我的代碼同時在 Linux 和 Windows 上工作。 我有一個二進制文件，其中包含一個文本 header，其中包含我想提取的Date和Time信息。 提取部分的示例（即信息如何保存在 txt 標頭中）在代碼的注釋部分中。 整個代碼是用 Python 編寫的，所以我想在 Python 中也進行此提取。 在 Linux 中，我只需使用subprocess進程和grep （參考）：

import subprocess
hosts = subprocess.check_output("grep -E -a 'Date' /path/Bckgrnd.bip", shell=True)
sentence = hosts.decode('utf-8')
# '----------------------------  Date:09/09/2020   Time:11:26:19  ----------------------------\n  Capture Time/Date:\t11:26:17 on 09/09/2020\n----------------------------  Date:09/09/2020   Time:11:26:19  ----------------------------\n'

date = sentence[sentence.index('Date:')+5:sentence.index('Date:')+13]
time = sentence[sentence.index('Time:')+5:sentence.index('Time:')+13]
print(date, time)
# 09/09/20 11:26:19

問題是這將在 Windows 上失敗。 另一種方法是在 Python 中加載文件：

file_input = /path/Bckgrnd.bip
with open(file_input, 'rb') as f:
    s = f.read()
print(s.find(b'Date'))
# 498
date = s[s.find(b'Date')+5:s.find(b'Date')+13].decode('utf-8')
time = s[s.find(b'Time')+5:s.find(b'Time')+13].decode('utf-8')
print(date, time)

這有一個主要問題。 它必須將整個文件讀入 memory，如果文件很大，那就有問題了。 有沒有辦法解決 grep 的操作系統問題？ 在不加載整個二進制文件的情況下，在純 python 中是否有替代方案？

更新：關於速度——我相信grep比純 Python 更快，所以它不僅在內存方面而且在速度方面更好。

請注意，即使是 grep 也將其視為二進制文件（例如此處提到的-a標簽）。

Answer 1

無論如何，您都必須搜索整個文件，即使是 grep 也會這樣做。 但是，您不必將整個文件加載到 memory 中，一次只搜索一行即可。

file_input = '/path/Bckgrnd.bip'
with open(file_input, 'rb') as f:
    for line in f.readlines():
        if b'Date' in line:
            s = line
            date = s[s.find(b'Date')+5:s.find(b'Date')+13].decode('utf-8')
            time = s[s.find(b'Date')+5:s.find(b'Date')+13].decode('utf-8')
            print(date, time)
            break  # Only break here if you expect exactly one match

您也許還可以使用strftime改進您的日期和時間提取，但我不確定您正在使用的格式，所以我沒有花任何時間嘗試這樣做。

您說該文件是二進制文件，但您將其解碼為 UTF-8，這讓我認為它是文本。 還使用 grep 讓我想到文字。

如果它確實是二進制文件並且沒有很多換行符，那么您可以一次讀取一個字節的文件。

file_input = '/path/Bckgrnd.bip'
buffer = bytes()
with open(file_input, 'rb') as f:
    buffer = buffer[1:] + f.read(1)
    if buffer == b'Date':
        # Read the next set of however many bytes you need to interpret the date and time

最后一點，這不會使您的程序更快，但會減少您的 memory 使用。

使用 python/grep 從 Windows 和 Linux 上的二進制文件中提取字符串

問題描述

1 個解決方案

解決方案1
0 已采納 2021-02-22 21:07:37

使用 python/grep 從 Windows 和 Linux 上的二進制文件中提取字符串

問題描述

1 個解決方案

解決方案1 0 已采納 2021-02-22 21:07:37

解決方案1
0 已采納 2021-02-22 21:07:37