[英]Convert bytes to bits in python
I am working with Python3.2.我正在使用 Python3.2。 I need to take a hex stream as an input and parse it at bit-level.我需要将十六进制 stream 作为输入并在位级别进行解析。 So I used所以我用
bytes.fromhex(input_str)
to convert the string to actual bytes.将字符串转换为实际字节。 Now how do I convert these bytes to bits?现在如何将这些字节转换为位?
Another way to do this is by using the bitstring
module:另一种方法是使用bitstring
模块:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b
:如果您需要剥离前导0b
:
>>> c.bin[2:]
'11111111'
The bitstring
module isn't a requirement, as jcollado 's answer shows, but it has lots of performant methods for turning input into bits and manipulating them.正如jcollado的回答所示, bitstring
模块不是必需的,但它有许多高性能方法可以将输入转换为位并对其进行操作。 You might find this handy (or not), for example:您可能会发现这很方便(或不方便),例如:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.等等
What about something like this?这样的事情呢?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.这会将您必须的十六进制字符串转换为整数,并将该整数转换为字符串,其中每个字节设置为 0/1,具体取决于整数的位值。
As pointed out by a comment, if you need to get rid of the 0b
prefix, you can do it this way:正如评论所指出的,如果您需要摆脱0b
前缀,您可以这样做:
>>> bin(int('ff', base=16)).lstrip('0b')
'11111111'
or this way:或者这样:
>>> bin(int('ff', base=16))[2:]
'11111111'
Operations are much faster when you work at the integer level.当您在整数级别工作时,操作会快得多。 In particular, converting to a string as suggested here is really slow.特别是,按照这里的建议转换为字符串真的很慢。
If you want bit 7 and 8 only, use eg如果您只需要第 7 位和第 8 位,请使用例如
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3
is the number with the first two bits set...) (这是:将字节向右移动 6 位 - 删除它们。然后只保留最后两位3
是前两位设置的数字......)
These can easily be translated into simple CPU operations that are super fast.这些可以很容易地转换为超级快速的简单 CPU 操作。
using python format string syntax使用python 格式字符串语法
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens.第二行是魔法发生的地方。 All byte objects have a .hex()
function, which returns a hex string.所有字节对象都有一个.hex()
函数,它返回一个十六进制字符串。 Using this hex string, we convert it to an integer, telling the int()
function that it's a base 16 string (because hex is base 16).使用这个十六进制字符串,我们将它转换为一个整数,告诉int()
函数它是一个基数为 16 的字符串(因为十六进制是基数 16)。 Then we apply formatting to that integer so it displays as a binary string.然后我们对该整数应用格式,使其显示为二进制字符串。 The {:08b}
is where the real magic happens. {:08b}
是真正神奇的地方。 It is using the Format Specification Mini-Language format_spec
.它使用Format Specification Mini-Language format_spec
。 Specifically it's using the width
and the type
parts of the format_spec syntax.具体来说,它使用了 format_spec 语法的width
和type
部分。 The 8
sets width
to 8, which is how we get the nice 0000 padding, and the b
sets the type to binary. 8
将width
设置为 8,这就是我们如何获得漂亮的 0000 填充,而b
将类型设置为二进制。
I prefer this method over the bin()
method because using a format string gives a lot more flexibility.我更喜欢这种方法而不是bin()
方法,因为使用格式字符串提供了更多的灵活性。
I think simplest would be use numpy
here.我认为最简单的方法是在这里使用numpy
。 For example you can read a file as bytes and then expand it to bits easily like this:例如,您可以将文件读取为字节,然后像这样轻松地将其扩展为位:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
Here how to do it using format()
这里如何使用format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . 08b很重要。 That means it will be a maximum of 8 leading zeros be appended to complete a byte.这意味着将最多附加 8 个前导零来完成一个字节。 If you don't specify this then the format will just have a variable bit length for each converted byte.如果你没有指定这个,那么格式对于每个转换的字节只有一个可变的位长度。
Use ord
when reading reading bytes:读取读取字节时使用ord
:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or或者
Using str.format()
:使用str.format()
:
'{:08b}'.format(ord(f.read(1)))
二进制:
bin(byte)[2:].zfill(8)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:会给:
['0b1000001', '0b1000010', '0b1000011']
The other answers here provide the bits in big-endian order ( '\x01'
becomes '00000001'
)此处的其他答案以大端顺序提供位( '\x01'
变为'00000001'
)
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:如果您对位的小端顺序感兴趣,这在许多情况下都很有用,例如 bignums 的常见表示等 - 这是一个片段:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:而对于另一个方向:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list.将字节(不是字符串)转换为位列表的一行函数。 There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.当源是从一个字节读取器/写入器到另一个字节读取器/写入器时,没有 endnians 问题,只有当源和目标是位读取器和位写入器时。
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one.在寻找将integer转换为位串等于 1 的位位置列表时,我遇到了这个答案。 This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16)
.如果您首先将十六进制字符串转换为 integer ,如int('0x453', 16)
这将与这个问题非常相似。
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin
turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.现在,给定一个 integer - 一种已经在硬件中进行了良好编码的表示,我非常惊讶地发现上述解决方案的字符串变体使用bin
之类的东西比基于 numpy 的单个数字的解决方案更快,并且我想我会很快写出结果。
I wrote three variants of the function. First using numpy:我写了 function 的三个变体。首先使用 numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:然后使用字符串逻辑:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.最后,我添加了第三种方法,我预先计算了单个字节的位置查找表,并在给定更大的项目大小的情况下对其进行了扩展。
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:基准代码如下:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.它清楚地表明 str 和查找表实现领先于 numpy。我在 CPython 3.10 和 3.11 上测试了它。
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.