简体   繁体   English

从字节中提取位

[英]Extracting bits from bytes

I am handling compressed data.我正在处理压缩数据。 The format contains a lookup table, and an array of long ints, which each may contain multiple values.该格式包含一个查找表和一个长整数数组,每个长整数可能包含多个值。 Bit length of contained values varies depending on the file.包含值的位长因文件而异。 I have access to these longs as bytes : is there an easy way to access a particular bit / bit range, or do I have to make one from scratch ?我可以访问这些 long 作为bytes :有没有一种简单的方法来访问特定的位/位范围,还是我必须从头开始制作? Within the standard library在标准库中

Note that I may have to continue to the next long value when the bit length isn't a factor of 64.请注意,当位长不是 64 的因数时,我可能必须继续使用下一个 long 值。


Theoretical example of what the code needs to do :代码需要做什么的理论示例:

  • Take the long integer 4503672641818897L取长整数4503672641818897L
  • Convert it to bits ( should return 0000000000010000000000000001000100000000000000000001000100010001 )将其转换为位(应返回0000000000010000000000000001000100000000000000000001000100010001
  • Read the lookup table and determine how long the values are (let's say 5 bits this time)读取查找表并确定值的长度(这次假设为5位)
  • Read the 6th value, bits 25 - 29 ( 00100 )读取第 6 个值,位 25 - 29 ( 00100 )
  • Return the int value 4返回整数值4

Here is a solution not depending on external libraries like numpy or on string conversions:这是一个不依赖于 numpy 等外部库或字符串转换的解决方案:

def get_bits(num, start, end, length=64):
    '''Like bits(num)[from:to] interpreted as int'''
    mask = 2**(end-start)-1
    shift = length - (end-start) - start
    return (num & (mask << shift)) >> shift


print(get_bits(17, 0, 3, length=6))  # 010001[0:3] -> 010 = 2
print(get_bits(17, 3, 6, length=6))  # 010001[3:6] -> 001 = 1
print(get_bits(17, 0, 6, length=6))  # 010001[0:6] -> 010001 = 17
print(get_bits(4503672641818897, 25, 30))  # ...[25:30] -> 00100 = 4

Explanation:解释:

  • mask = 2**(end-start)-1 : end-start is the number of bits to select (N), then 2**N is a one with N zeros (2**3 -> 1000). mask = 2**(end-start)-1end-start是要选择的位数 (N),然后2**N是一个有 N 个零 (2**3 -> 1000)。 2**N - 1 then is N ones (1000 - 1 = 111). 2**N - 1那么是 N 个 (1000 - 1 = 111)。
  • shift = length - (end-start) - start : The number of bits we want to shift the mask to the left (111 << 3 = 111000) and also the number of bits we want the result to shift to the right: 010001 & 111000 is 010000, we only want the first three bits. shift = length - (end-start) - start :我们希望将掩码向左移动的位数 (111 << 3 = 111000) 以及我们希望结果向右移动的位数:010001 & 111000 是 010000,我们只需要前三位。 010000 >> 3 is 010. 010000 >> 3 是 010。
  • return (num & (mask << shift)) >> shift : Now we put it all together return (num & (mask << shift)) >> shift :现在我们把它们放在一起

试试这个功能unpackbitsnumpy https://numpy.org/doc/stable/reference/generated/numpy.unpackbits.html

Bits = numpy.unpackbits(Bytes)

What about this solution:这个解决方案怎么样:

unpacked = "{0:b}".format(long_int)
unpacked = "0"*(64-len(unpacked)) + unpacked
int(unpacked[25:30],2)

EDIT : DOES NOT WORK !编辑:不起作用! the int constructor assumes a signed int, and there is no way to tell it to construct a uint int 构造函数假定一个带符号的 int,并且无法告诉它构造一个 uint

Here's a hacky solution I found.这是我找到的一个hacky解决方案。 Seems very unwieldy though虽然看起来很笨重

def bitValue(byteValue, start, length):
    """Extract length bits from byteValue at start, and return them as an integer"""
    return int(bin(byteValue).lstrip('0b').rjust(64, '0')[start:start+length], 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM