简体   繁体   中英

Converting bits to bytes in Python

I am trying to convert a bit string into a byte string, in Python 3.x. In each byte, bits are filled from high order to low order. The last byte is filled with zeros if necessary. The bit string is initially stored as a "collection" of booleans or integers (0 or 1), and I want to return a "collection" of integers in the range 0-255. By collection, I mean a list or a similar object, but not a character string: for example, the function below returns a generator.

So far, the fastest I am able to get is the following:

def bitsToBytes(a):
    s = i = 0
    for x in a:
        s += s + x
        i += 1
        if i == 8:
            yield s
            s = i = 0
    if i > 0:
        yield s << (8 - i)

I have tried several alternatives: using enumerate, bulding a list instead of a generator, computing s by "(s << 1) | x" instead of the sum, and everything seems to be a bit slower. Since this solution is also one of the shortest and simplest I found, I am rather happy with it.

However, I would like to know if there is a faster solution. Especially, is there a library routine the would do the job much faster, preferably in the standard library?


Example input/output

[] -> []
[1] -> [128]
[1,1] -> [192]
[1,0,0,0,0,0,0,0,1] -> [128,128]

Here I show the examples with lists. Generators would be fine. However, string would not, and then it would be necessary to convert back and foth between list-like data an string.

The simplest tactics to consume bits in 8-er chunks and ignore exceptions:

def getbytes(bits):
    done = False
    while not done:
        byte = 0
        for _ in range(0, 8):
            try:
                bit = next(bits)
            except StopIteration:
                bit = 0
                done = True
            byte = (byte << 1) | bit
        yield byte

Usage:

lst = [1,0,0,0,0,0,0,0,1]
for b in getbytes(iter(lst)):
    print b

getbytes is a generator and accepts a generator, that is, it works fine with large and potentially infinite streams.

Step 1: Add in buffer zeros

Step 2: Reverse bits since your endianness is reversed

Step 3: Concatenate into a single string

Step 4: Save off 8 bits at a time into an array

Step 5: ???

Step 6: Profit

def bitsToBytes(a):
    a = [0] * (8 - len(a) % 8) + a # adding in extra 0 values to make a multiple of 8 bits
    s = ''.join(str(x) for x in a)[::-1] # reverses and joins all bits
    returnInts = []
    for i in range(0,len(s),8):
        returnInts.append(int(s[i:i+8],2)) # goes 8 bits at a time to save as ints
    return returnInts

Using itertools ' grouper()` recipe :

from functools import reduce
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

bytes = [reduce(lambda byte, bit: byte << 1 | bit, eight_bits)
         for eight_bits in grouper(bits, 8, fillvalue=0)]

Example

[] -> []
[1] -> [128]
[1, 1] -> [192]
[1, 0, 0, 0, 0, 0, 0, 0, 1] -> [128, 128]

If input is a string then a specialized solution might be faster:

>>> bits = '100000001'
>>> padded_bits = bits + '0' * (8 - len(bits) % 8)
>>> padded_bits
'1000000010000000'
>>> list(int(padded_bits, 2).to_bytes(len(padded_bits) // 8, 'big'))
[128, 128]

The last byte is zero if len(bits) % 8 == 0 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM