文件读取和RE解析

Question

I've a strange behaviour that I don't understand : 我有一种我不了解的奇怪行为：

If I open my file , I find my bytes , but only once at a time : 如果我打开文件，则会找到字节，但一次只能找到一次：

f = open('d:\BB.ki', "rb")
f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', f.read() )
print f10
['1BBBAAAABBBBAAAABBBBAAAABBBBAAAA\x00']

f = open('d:\BB.ki', "rb")
f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', f.read() )
print f11
['2AAABBBBAAAABBBBAAAA\x00']

If I try to opening the file and getting severall bytes , I only get the 1st one (f11 is empty ) 如果我尝试打开文件并获取几个字节，我只会得到第一个（f11为空）

f = open('d:\BB.ki', "rb")
f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', f.read() )
f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', f.read() )
print f10,f11
['1BBBAAAABBBBAAAABBBBAAAABBBBAAAA\x00'] **[]**

May I use a loop , or something similar ? 我可以使用循环或类似的方法吗？

Thanks 谢谢

Answer 1

After you call f.read() there are no more bytes available to be read so a second call to f.read() will return an empty string. 调用f.read()之后，不再有可供读取的字节，因此第二次调用f.read()将返回一个空字符串。 Store the result of f.read() instead of reading twice: 存储f.read（）的结果，而不是读取两次：

s = f.read()
f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', s)
f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', s)

You may also want to scan the data just a single time, finding both expressions: 您可能还想一次扫描数据，找到两个表达式：

matches = re.findall( b'\x03\x00\x00[\x10\x11]''(.*?)''\xF7\x00\xF0', s)

If your file contains the bytes '\\x03\\x00\\x00\\x10\\x03\\x00\\x00\\x11_\\xF7\\x00\\xF0' the method you proposed will find two overlapping matches ( \\x03\\x00\\x00\\x11_ and _ ), whereas the single scan approach finds only a single match. 如果您的文件包含字节'\\x03\\x00\\x00\\x10\\x03\\x00\\x00\\x11_\\xF7\\x00\\xF0'则您建议的方法将找到两个重叠的匹配项（ \\x03\\x00\\x00\\x11_和_ ），而单次扫描方法只能找到一个匹配项。

Answer 2

f.read() consumes the entire file. f.read()消耗整个文件。 only f10 will seen. 只有f10可以看到。

try this maybe. 试试这个吧。

 for line in open('d:\BB.ki', "rb").readlines():
    f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', line )
    f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', line )

文件读取和RE解析

问题描述

2 个解决方案

解决方案1
1 2012-07-05 13:30:03

解决方案2
0 2012-07-05 13:27:53

文件读取和RE解析

问题描述

2 个解决方案

解决方案1 1 2012-07-05 13:30:03

解决方案2 0 2012-07-05 13:27:53

解决方案1
1 2012-07-05 13:30:03

解决方案2
0 2012-07-05 13:27:53