简体   繁体   English

使用findall遍历字节

[英]Iterate over bytes with findall

I'm trying to work with a settings file that is a binary file to find out how it's stuctured so that I might get some information about file location etc. from it. 我正在尝试使用一个二进制文件的设置文件来了解其结构,以便我可以从中获取有关文件位置等的信息。

As far as I can tell, the interesting data is either exactly after or near escape chars b'\\x03\\SETTING' - here's an example with a setting I'm interested in 'LQ'.. 据我所知,有趣的数据恰好在转义字符b'\\ x03 \\ SETTING'之后或附近-这是我对'LQ'感兴趣的设置示例。

\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
\x03HTAPp\x00\x00\x00\x02\x02\x00\x00\x01\x02L\x02\x00\x00\x00\x01
\x03LQ\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a08.56d829a7_56d82956d829a0.mxf
\x03HTAPp\\x00\x00\x00\x02\x02\x00\x00\x01\x02L\x02\x00\x00\x00\x01
\x03LQ\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a07.56d829a6_56d82956d829a0.mxf

so it looks like each 'sentence' starts with \\x03 - & the path I'm looking for here is on the 8th byte after the LQ setting '\\x03LQ' 所以看起来每个“句子”都以\\ x03开头-并且我在这里寻找的路径是LQ设置'\\ x03LQ'之后的第8个字节

The file also has other settings that I want to capture - and each time it looks like the setting is directly after an escape char and padded by a short desciption of the setting and a number of bytes. 该文件还具有我要捕获的其他设置-每次看起来该设置都在转义符之后并通过简短描述该设置和一定数量的字节来填充。

ATM I am reading the binary and can find a specific path (only, if I know how long it is right now) 我正在读取二进制文件的ATM,并且可以找到特定的路径(仅当我知道它现在有多长时间时)

with open(file, "rb") as abin:
            abin.seek(0) 
            data = abin.read()
            foo = re.search(b'\x03LQ', data) 
            abin.seek(foo.start() + 8) # cursor lands on 8th byte
            eg = abin.read(32)
            # so I get the path of some length as eg.....

This is not what I want, as I want to read the entire bytestring until the next escape char, and then find the next setting that occurs and read the path. 这不是我想要的,因为我想读取整个字节串,直到下一个转义字符,然后找到出现的下一个设置并读取路径。

I'm experimenting with findall(), but it just returns a list of bytes objects that are the same (it seems), and I don't understand how to search for each unique path & the instance of each byte string and read from some cursor position in the data. 我正在尝试使用findall(),但是它只是返回一个相同的字节对象列表(似乎),而且我不明白如何搜索每个唯一路径和每个字节字符串的实例并从中读取数据中的某些光标位置。 Eg. 例如。

bar = re.findall(b'\x03LQ', data)
for bs in bar:
    foo = re.search(bs, data)
    abin.seek(foo.start() + 8)
    eg = abin.read(64)
    print('This is just the same path each time', eg)

Pointers anyone? 指针有人吗?

The key is to look at the result of your findall() , which is just going to be: 关键是看一下findall()的结果,它将是:

[b'\x03LQ', b'\x03LQ', b'\x03LQ', ...]

You're only telling it to find a static string, so that's all it's going to return. 您只告诉它找到一个静态字符串,这就是它要返回的全部。 To make the results useful, you can tell it to instead capture what comes after the given string. 为了使结果有用,可以告诉它捕获给定字符串之后的内容。 Here's an example that will grab everything after the given string until the next \\x03 byte: 这是一个示例,它将抓取给定字符串之后的所有内容,直到下一个\\x03字节为止:

findall(rb'\x03LQ([^\x03]*)', data)

The parens tell findall() what part of the match you want, and [^\\x03]* means "match any number of bytes that are not \\x03 ". 括号告诉findall()所需的匹配部分, [^\\x03]*意思是“匹配任意数量的不是\\x03的字节”。 The result from your example should be: 您的示例的结果应为:

[b'\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a08.56d829a7_56d82956d829a0.mxf\n', 
 b'\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a07.56d829a6_56d82956d829a0.mxf']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM