简体   繁体   English

当有多个字节顺序时,Python struct.unpack(ing)?

[英]Python struct.unpack(ing) when there are multiple byte-orders?

I have a function that reads a binary file and then unpacks the file's contents using struct.unpack(). 我有一个函数读取二进制文件,然后使用struct.unpack()解压缩文件的内容。 My function works just fine. 我的功能很好用。 It is faster if/when I unpack the whole of the file using a long 'format' string. 如果/当我使用长“格式”字符串解压缩整个文件时,它会更快。 Problem is that sometimes the byte-alignment changes so my format string (which is invalid) would look like '<10sHHb>llh' (this is just an example (they are usually way longer)). 问题是有时字节对齐会改变,所以我的格式字符串(无效)看起来像'<10sHHb> llh'(这只是一个例子(它们通常更长))。 Is there any ultra slick/pythonic way of handling this situation? 是否有任何超光滑/ pythonic方式处理这种情况?

Nothing super-slick, but if speed counts, the struct module top-level functions are wrappers that have to repeatedly recheck a cache for the actual struct.Struct instance corresponding to the format string; 没有什么超级流畅,但如果速度很重要, struct模块顶级函数是包装器,必须重复检查与格式字符串对应的实际struct.Struct实例的缓存; while you must make separate format strings, you might solve part of your speed problem by avoiding that repeated cache check. 虽然您必须制作单独的格式字符串,但您可以通过避免重复的缓存检查来解决部分速度问题。

Instead of doing: 而不是做:

buffer = memoryview(somedata)
allresults = []
while buffer:
    allresults += struct.unpack_from('<10sHHb', buffer)
    buffer = buffer[struct.calcsize('<10sHHb'):]
    allresults += struct.unpack_from('>llh', buffer)
    buffer = buffer[struct.calcsize('>llh'):]

You'd do: 你做的:

buffer = memoryview(somedata)
structa = struct.Struct('<10sHHb')
structb = struct.Struct('>llh')
allresults = []
while buffer:
    allresults += structa.unpack_from(buffer)
    buffer = buffer[structa.size:]
    allresults += structb.unpack_from(buffer)
    buffer = buffer[structb.size:]

No, it's not much nicer looking, and the speed gains aren't likely to blow you away. 不,看起来并不好看,速度提升不太可能让你失望。 But you've got weird data, so this is the least brittle solution. 但是你有很奇怪的数据,所以这是最不易解决的问题。

If you want unnecessarily clever/brittle solutions, you could do this with ctypes custom Structure s, nesting BigEndianStructure (s) inside a LittleEndianStructure or vice-versa. 如果你想要不必要的聪明/脆弱的解决方案,你可以使用ctypes自定义Structure ,在LittleEndianStructure嵌套BigEndianStructure ,反之亦然。 For your example format : 对于您的示例格式:

from ctypes import *

class BEStruct(BigEndianStructure):
    _fields_ = [('x', 2 * c_long), ('y', c_short)]
    _pack_ = True

class MainStruct(LittleEndianStructure):
    _fields_ = [('a', 10 * c_char), ('b', 2 * c_ushort), ('c', c_byte), ('big', BEStruct)]
    _pack_ = True

would give you a structure such that you could do: 会给你一个结构,你可以这样做:

mystruct = MainStruct()
memoryview(mystruct).cast('B')[:] = bytes(range(25))

and you'd then get results in the expected order, eg: 然后你会得到预期顺序的结果,例如:

>>> hex(mystruct.b[0])  # Little endian as expected in main struct
'0xb0a'
>>> hex(mystruct.big.x[0]) # Big endian from inner big endian structure
'0xf101112'

While clever in a way, it's likely it will run slower ( ctypes attribute lookup is weirdly slow in my experience), and unlike struct module functions, you can't just unpack into top-level named variables in a single line, it's attribute access all the way. 虽然在某种程度上很聪明,它可能会运行ctypes慢( ctypes属性查找在我的经验中非常慢),并且与struct module函数不同,你不能只在一行中解压缩到顶级命名变量,它的属性访问权限一路走来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM