从文件中解压缩位（子字节）数字的最快方法

Question

Given a file with resolution-compressed binary data, I would like to convert the sub-byte bits into their integer representations in python. 给定一个具有分辨率压缩二进制数据的文件，我想将子字节位转换为python中的整数表示。 By this I mean I need to interpret n bits from a file as an integer. 我的意思是我需要将文件中的n位解释为整数。

Currently I am reading the file into bitarray objects, and am converting subsets of the objects into integers. 目前我正在将文件读入bitarray对象，并将对象的子集转换为整数。 The process works but is fairly slow and cumbersome. 这个过程有效，但相当缓慢和繁琐。 Is there a better way to do this, perhaps with the struct module? 有没有更好的方法来实现这一点，也许使用struct模块？

import bitarray

bits = bitarray.bitarray()
with open('/dir/to/any/file.dat','r') as f:
    bits.fromfile(f,2) # read 2 bytes into the bitarray

    ## bits 0:4 represent a field
    field1 = int(bits[0:4].to01(), 2)  # Converts to a string of 0s and 1s, then int()s the string

    ## bits 5:7 represent a field
    field2 = int(bits[4:7].to01(), 2)

    ## bits 8:16 represent a field
    field3 = int(bits[7:16].to01(), 2)

print """All bits: {bits}\n\tfield1: {b1}={field1}\n\tfield2: {b2}={field2}\n\tfield3: {b3}={field3}""".format(
        bits=bits, b1=bits[0:4].to01(), field1=field1, 
        b2=bits[4:7].to01(), field2=field2,
        b3=bits[7:16].to01(), field3=field3)

Outputs: 输出：

All bits: bitarray('0000100110000000')
    field1: 0000=0
    field2: 100=4
    field3: 110000000=384

Answer 1

If you are ok with using someone's module, it looks like the bitstring module has good representation and manipulation of bits: http://pythonhosted.org/bitstring/index.html 如果您可以使用某人的模块，看起来像bitstring模块具有良好的表示和位操作： http ： //pythonhosted.org/bitstring/index.html

For instance, if you know the size of your fields you could use format strings: http://pythonhosted.org/bitstring/reading.html#reading-using-format-strings 例如，如果您知道字段的大小，则可以使用格式字符串： http ： //pythonhosted.org/bitstring/reading.html#reading-using-format-strings

import bitstring
bitstream = bitstring.ConstBitStream(filename='testfile.bin')
field1, field2, field3 = bitstream.readlist('int:4, int:3, int:9')

If you didn't know your fields sizes you could read in all the bits and then use slicing to extract all your fields: http://pythonhosted.org/bitstring/slicing.html 如果您不知道字段大小，则可以读取所有位，然后使用切片提取所有字段： http ： //pythonhosted.org/bitstring/slicing.html

import bitstring
bitstream = bitstring.ConstBitStream(filename='testfile.bin')
bits = bitstring.BitArray(bitstream)
field1 = bits[0:4].int
field2 = bits[4:7].int
field3 = bits[7:16].int

Just a thought, you probably already found this module. 只是一个想法，你可能已经找到了这个模块。

Answer 2

This should work for your specific case: 这适用于您的具体情况：

#bitmasks of fields 1-3, they fit in 2 bytes
FIELD1 = 0b1111000000000000 # first 4 bits
FIELD2 = 0b0000111000000000 # next 3 bits
FIELD3 = 0b0000000111111111 # last 9 bits

def bytes_to_int(num):  #convert bytes object to an int
    res = 0
    num = num[::-1]  # reverse the bytes
    for i in range(len(num)):
        res += num[i] * (256**i)
    return res

def get_fields(f):
    chunk = bytes_to_int(f.read(2))  # read 2 bytes, f1-f3, convert to int
    f1 = (chunk & FIELD1) >> 12  # get each field with its bitmask
    f2 = (chunk & FIELD2) >> 9
    f3 = chunk & FIELD3
    f4 = f.read(f3)  # field4 as a bytes object

    return f1, f2, f3, f4

file = open('file.dat','rb')

#using your sample data
print(get_fields(file))  # returns 0, 4, 384, field4 as a bytes obj

file.close()

从文件中解压缩位（子字节）数字的最快方法

问题描述

2 个解决方案

解决方案1
4 2016-03-30 21:26:33

解决方案2
3 已采纳 2016-03-31 03:10:08

从文件中解压缩位（子字节）数字的最快方法

问题描述

2 个解决方案

解决方案1 4 2016-03-30 21:26:33

解决方案2 3 已采纳 2016-03-31 03:10:08

解决方案1
4 2016-03-30 21:26:33

解决方案2
3 已采纳 2016-03-31 03:10:08