简体   繁体   中英

Convert bytes to bits in python

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used

bytes.fromhex(input_str)

to convert the string to actual bytes. Now how do I convert these bytes to bits?

Another way to do this is by using the bitstring module:

>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'

And if you need to strip the leading 0b :

>>> c.bin[2:]
'11111111'

The bitstring module isn't a requirement, as jcollado 's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:

>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'

etc.

What about something like this?

>>> bin(int('ff', base=16))
'0b11111111'

This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.

As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:

>>> bin(int('ff', base=16)).lstrip('0b')
'11111111'

or this way:

>>> bin(int('ff', base=16))[2:]
'11111111'

Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.

If you want bit 7 and 8 only, use eg

val = (byte >> 6) & 3

(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)

These can easily be translated into simple CPU operations that are super fast.

using python format string syntax

>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111

The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec . Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.

I prefer this method over the bin() method because using a format string gives a lot more flexibility.

I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:

Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)

Here how to do it using format()

print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)

It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.

Use ord when reading reading bytes:

byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix

Or

Using str.format() :

'{:08b}'.format(ord(f.read(1)))

二进制:

bin(byte)[2:].zfill(8)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]

Will give:

['0b1000001', '0b1000010', '0b1000011']

The other answers here provide the bits in big-endian order ( '\x01' becomes '00000001' )

In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:

def bits_little_endian_from_bytes(s):
    return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)

And for the other direction:

def bytes_from_bits_little_endian(s):
    return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))

One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.

def byte2bin(b):
    return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]

I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16) .

Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.

I wrote three variants of the function. First using numpy:

import math
import numpy as np
def bit_positions_numpy(val):
    """
    Given an integer value, return the positions of the on bits.
    """
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
    bit_arr = np.unpackbits(arr, bitorder='big')
    bit_positions = np.where(bit_arr[::-1])[0].tolist()
    return bit_positions

Then using string logic:

def bit_positions_str(val):
    is_negative = val < 0
    if is_negative:
        bit_length = val.bit_length() + 1
        length = math.ceil(bit_length / 8.0)  # bytelength
        neg_position = (length * 8) - 1
        # special logic for negatives to get twos compliment repr
        max_val = 1 << neg_position
        val_ = max_val + val
    else:
        val_ = val
    binary_string = '{:b}'.format(val_)[::-1]
    bit_positions = [pos for pos, char in enumerate(binary_string)
                     if char == '1']
    if is_negative:
        bit_positions.append(neg_position)
    return bit_positions

And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.

BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
    positions = [pos  for pos, mask in pos_masks if (mask & i)]
    BYTE_TO_POSITIONS.append(positions)


def bit_positions_lut(val):
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    bit_positions = []
    for offset, b in enumerate(bytestr[::-1]):
        pos = BYTE_TO_POSITIONS[b]
        if offset == 0:
            bit_positions.extend(pos)
        else:
            pos_offset = (8 * offset)
            bit_positions.extend([p + pos_offset for p in pos])
    return bit_positions

The benchmark code is as follows:

def benchmark_bit_conversions():
    # for val in [-0, -1, -3, -4, -9999]:

    test_values = [
        # -1, -2, -3, -4, -8, -32, -290, -9999,
        # 0, 1, 2, 3, 4, 8, 32, 290, 9999,
        4324, 1028, 1024, 3000, -100000,
        999999999999,
        -999999999999,
        2 ** 32,
        2 ** 64,
        2 ** 128,
        2 ** 128,
    ]

    for val in test_values:
        r1 = bit_positions_str(val)
        r2 = bit_positions_numpy(val)
        r3 = bit_positions_lut(val)
        print(f'val={val}')
        print(f'r1={r1}')
        print(f'r2={r2}')
        print(f'r3={r3}')
        print('---')
        assert r1 == r2

    import xdev
    xdev.profile_now(bit_positions_numpy)(val)
    xdev.profile_now(bit_positions_str)(val)
    xdev.profile_now(bit_positions_lut)(val)

    import timerit
    ti = timerit.Timerit(10000, bestof=10, verbose=2)
    for timer in ti.reset('str'):
        for val in test_values:
            bit_positions_str(val)

    for timer in ti.reset('numpy'):
        for val in test_values:
            bit_positions_numpy(val)

    for timer in ti.reset('lut'):
        for val in test_values:
            bit_positions_lut(val)

    for timer in ti.reset('raw_bin'):
        for val in test_values:
            bin(val)

    for timer in ti.reset('raw_bytes'):
        for val in test_values:
            val.to_bytes(val.bit_length(), 'big', signed=True)

And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.

Timed str for: 10000 loops, best of 10
    time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
    time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
    time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM