简体   繁体   English

在 python 中将字节转换为位

[英]Convert bytes to bits in python

I am working with Python3.2.我正在使用 Python3.2。 I need to take a hex stream as an input and parse it at bit-level.我需要将十六进制 stream 作为输入并在位级别进行解析。 So I used所以我用

bytes.fromhex(input_str)

to convert the string to actual bytes.将字符串转换为实际字节。 Now how do I convert these bytes to bits?现在如何将这些字节转换为位?

Another way to do this is by using the bitstring module:另一种方法是使用bitstring模块:

>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'

And if you need to strip the leading 0b :如果您需要剥离前导0b

>>> c.bin[2:]
'11111111'

The bitstring module isn't a requirement, as jcollado 's answer shows, but it has lots of performant methods for turning input into bits and manipulating them.正如jcollado的回答所示, bitstring模块不是必需的,但它有许多高性能方法可以将输入转换为位并对其进行操作。 You might find this handy (or not), for example:您可能会发现这很方便(或不方便),例如:

>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'

etc.等等

What about something like this?这样的事情呢?

>>> bin(int('ff', base=16))
'0b11111111'

This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.这会将您必须的十六进制字符串转换为整数,并将该整数转换为字符串,其中每个字节设置为 0/1,具体取决于整数的位值。

As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:正如评论所指出的,如果您需要摆脱0b前缀,您可以这样做:

>>> bin(int('ff', base=16)).lstrip('0b')
'11111111'

or this way:或者这样:

>>> bin(int('ff', base=16))[2:]
'11111111'

Operations are much faster when you work at the integer level.当您在整数级别工作时,操作会快得多。 In particular, converting to a string as suggested here is really slow.特别是,按照这里的建议转换为字符串真的很慢。

If you want bit 7 and 8 only, use eg如果您只需要第 7 位和第 8 位,请使用例如

val = (byte >> 6) & 3

(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...) (这是:将字节向右移动 6 位 - 删除它们。然后只保留最后两位3是前两位设置的数字......)

These can easily be translated into simple CPU operations that are super fast.这些可以很容易地转换为超级快速的简单 CPU 操作。

using python format string syntax使用python 格式字符串语法

>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111

The second line is where the magic happens.第二行是魔法发生的地方。 All byte objects have a .hex() function, which returns a hex string.所有字节对象都有一个.hex()函数,它返回一个十六进制字符串。 Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16).使用这个十六进制字符串,我们将它转​​换为一个整数,告诉int()函数它是一个基数为 16 的字符串(因为十六进制是基数 16)。 Then we apply formatting to that integer so it displays as a binary string.然后我们对该整数应用格式,使其显示为二进制字符串。 The {:08b} is where the real magic happens. {:08b}是真正神奇的地方。 It is using the Format Specification Mini-Language format_spec .它使用Format Specification Mini-Language format_spec Specifically it's using the width and the type parts of the format_spec syntax.具体来说,它使用了 format_spec 语法的widthtype部分。 The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary. 8width设置为 8,这就是我们如何获得漂亮的 0000 填充,而b将类型设置为二进制。

I prefer this method over the bin() method because using a format string gives a lot more flexibility.我更喜欢这种方法而不是bin()方法,因为使用格式字符串提供了更多的灵活性。

I think simplest would be use numpy here.我认为最简单的方法是在这里使用numpy For example you can read a file as bytes and then expand it to bits easily like this:例如,您可以将文件读取为字节,然后像这样轻松地将其扩展为位:

Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)

Here how to do it using format()这里如何使用format()

print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)

It is important the 08b . 08b很重要。 That means it will be a maximum of 8 leading zeros be appended to complete a byte.这意味着将最多附加 8 个前导零来完成一个字节。 If you don't specify this then the format will just have a variable bit length for each converted byte.如果你没有指定这个,那么格式对于每个转换的字节只有一个可变的位长度。

Use ord when reading reading bytes:读取读取字节时使用ord

byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix

Or或者

Using str.format() :使用str.format()

'{:08b}'.format(ord(f.read(1)))

二进制:

bin(byte)[2:].zfill(8)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]

Will give:会给:

['0b1000001', '0b1000010', '0b1000011']

The other answers here provide the bits in big-endian order ( '\x01' becomes '00000001' )此处的其他答案以大端顺序提供位( '\x01'变为'00000001'

In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:如果您对位的小端顺序感兴趣,这在许多情况下都很有用,例如 bignums 的常见表示等 - 这是一个片段:

def bits_little_endian_from_bytes(s):
    return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)

And for the other direction:而对于另一个方向:

def bytes_from_bits_little_endian(s):
    return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))

One line function to convert bytes (not string) to bit list.将字节(不是字符串)转换为位列表的一行函数。 There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.当源是从一个字节读取器/写入器到另一个字节读取器/写入器时,没有 endnians 问题,只有当源和目标是位读取器和位写入器时。

def byte2bin(b):
    return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]

I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one.在寻找将integer转换为位串等于 1 的位位置列表时,我遇到了这个答案。 This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16) .如果您首先将十六进制字符串转换为 integer ,如int('0x453', 16)这将与这个问题非常相似。

Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.现在,给定一个 integer - 一种已经在硬件中进行了良好编码的表示,我非常惊讶地发现上述解决方案的字符串变体使用bin之类的东西比基于 numpy 的单个数字的解决方案更快,并且我想我会很快写出结果。

I wrote three variants of the function. First using numpy:我写了 function 的三个变体。首先使用 numpy:

import math
import numpy as np
def bit_positions_numpy(val):
    """
    Given an integer value, return the positions of the on bits.
    """
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
    bit_arr = np.unpackbits(arr, bitorder='big')
    bit_positions = np.where(bit_arr[::-1])[0].tolist()
    return bit_positions

Then using string logic:然后使用字符串逻辑:

def bit_positions_str(val):
    is_negative = val < 0
    if is_negative:
        bit_length = val.bit_length() + 1
        length = math.ceil(bit_length / 8.0)  # bytelength
        neg_position = (length * 8) - 1
        # special logic for negatives to get twos compliment repr
        max_val = 1 << neg_position
        val_ = max_val + val
    else:
        val_ = val
    binary_string = '{:b}'.format(val_)[::-1]
    bit_positions = [pos for pos, char in enumerate(binary_string)
                     if char == '1']
    if is_negative:
        bit_positions.append(neg_position)
    return bit_positions

And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.最后,我添加了第三种方法,我预先计算了单个字节的位置查找表,并在给定更大的项目大小的情况下对其进行了扩展。

BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
    positions = [pos  for pos, mask in pos_masks if (mask & i)]
    BYTE_TO_POSITIONS.append(positions)


def bit_positions_lut(val):
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    bit_positions = []
    for offset, b in enumerate(bytestr[::-1]):
        pos = BYTE_TO_POSITIONS[b]
        if offset == 0:
            bit_positions.extend(pos)
        else:
            pos_offset = (8 * offset)
            bit_positions.extend([p + pos_offset for p in pos])
    return bit_positions

The benchmark code is as follows:基准代码如下:

def benchmark_bit_conversions():
    # for val in [-0, -1, -3, -4, -9999]:

    test_values = [
        # -1, -2, -3, -4, -8, -32, -290, -9999,
        # 0, 1, 2, 3, 4, 8, 32, 290, 9999,
        4324, 1028, 1024, 3000, -100000,
        999999999999,
        -999999999999,
        2 ** 32,
        2 ** 64,
        2 ** 128,
        2 ** 128,
    ]

    for val in test_values:
        r1 = bit_positions_str(val)
        r2 = bit_positions_numpy(val)
        r3 = bit_positions_lut(val)
        print(f'val={val}')
        print(f'r1={r1}')
        print(f'r2={r2}')
        print(f'r3={r3}')
        print('---')
        assert r1 == r2

    import xdev
    xdev.profile_now(bit_positions_numpy)(val)
    xdev.profile_now(bit_positions_str)(val)
    xdev.profile_now(bit_positions_lut)(val)

    import timerit
    ti = timerit.Timerit(10000, bestof=10, verbose=2)
    for timer in ti.reset('str'):
        for val in test_values:
            bit_positions_str(val)

    for timer in ti.reset('numpy'):
        for val in test_values:
            bit_positions_numpy(val)

    for timer in ti.reset('lut'):
        for val in test_values:
            bit_positions_lut(val)

    for timer in ti.reset('raw_bin'):
        for val in test_values:
            bin(val)

    for timer in ti.reset('raw_bytes'):
        for val in test_values:
            val.to_bytes(val.bit_length(), 'big', signed=True)

And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.它清楚地表明 str 和查找表实现领先于 numpy。我在 CPython 3.10 和 3.11 上测试了它。

Timed str for: 10000 loops, best of 10
    time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
    time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
    time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM