简体   繁体   中英

Convert binary (0|1) numpy to integer or binary-string?

Is there a shortcut to Convert binary (0|1) numpy array to integer or binary-string ? Fe

b = np.array([0,0,0,0,0,1,0,1])   
  => b is 5

np.packbits(b)

works but only for 8 bit values ..if the numpy is 9 or more elements it generates 2 or more 8bit values. Another option would be to return a string of 0|1 ...

What I currently do is :

    ba = bitarray()
    ba.pack(b.astype(np.bool).tostring())
    #convert from bitarray 0|1 to integer
    result = int( ba.to01(), 2 )

which is ugly !!!

One way would be using dot-product with 2-powered range array -

b.dot(2**np.arange(b.size)[::-1])

Sample run -

In [95]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])

In [96]: b.dot(2**np.arange(b.size)[::-1])
Out[96]: 1285

Alternatively, we could use bitwise left-shift operator to create the range array and thus get the desired output, like so -

b.dot(1 << np.arange(b.size)[::-1])

If timings are of interest -

In [148]: b = np.random.randint(0,2,(50))

In [149]: %timeit b.dot(2**np.arange(b.size)[::-1])
100000 loops, best of 3: 13.1 µs per loop

In [150]: %timeit b.dot(1 << np.arange(b.size)[::-1])
100000 loops, best of 3: 7.92 µs per loop

Reverse process

To retrieve back the binary array, use np.binary_repr alongwith np.fromstring -

In [96]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])

In [97]: num = b.dot(2**np.arange(b.size)[::-1]) # integer

In [98]: np.fromstring(np.binary_repr(num), dtype='S1').astype(int)
Out[98]: array([1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1])

I extended the good dot product solution of @Divikar to run ~180x faster on my host, by using vectorized matrix multiplication code. The original code that runs one-row-at-a-time took ~3 minutes to run 100K rows of 18 columns in my pandas dataframe. Well, next week I need to upgrade from 100K rows to 20M rows, so ~10 hours of running time was not going to be fast enough for me. The new code is vectorized, first of all. That's the real change in the python code. Secondly, matmult often runs in parallel without you seeing it, on many-core processors depending on your host configuration, especially when OpenBLAS or other BLAS is present for numpy to use on matrix algebra like this matmult. So it can use a lot of processors and cores, if you have it.

The new -- quite simple -- code runs 100K rows x 18 binary columns in ~1 sec ET on my host which is "mission accomplished" for me:

'''
Fast way is vectorized matmult. Pass in all rows and cols in one shot.
'''
def BitsToIntAFast(bits):
  m,n = bits.shape # number of columns is needed, not bits.size
  a = 2**np.arange(n)[::-1]  # -1 reverses array of powers of 2 of same length as bits
  return bits @ a  # this matmult is the key line of code

'''I use it like this:'''
bits = d.iloc[:,4:(4+18)] # read bits from my pandas dataframe
gs = BitsToIntAFast(bits)
print(gs[:5])
gs.shape
...
d['genre'] = np.array(gs)  # add the newly computed column to pandas

Hope this helps.

My timeit results:

b.dot(2**np.arange(b.size)[::-1])
100000 loops, best of 3: 2.48 usec per loop

b.dot(1 << np.arange(b.size)[::-1])
100000 loops, best of 3: 2.24 usec per loop

# Precompute powers-of-2 array with a = 1 << np.arange(b.size)[::-1]
b.dot(a)
100000 loops, best of 3: 0.553 usec per loop

# using gmpy2 is slower
gmpy2.pack(list(map(int,b[::-1])), 1)
100000 loops, best of 3: 10.6 usec per loop

So if you know the size ahead of time, it's significantly faster to precompute the powers-of-2 array. But if possible, you should do all computations simultaneously using matrix multiplication like in Geoffrey Anderson's answer.

Using numpy for conversion limits you to 64-bit signed binary results. If you really want to use numpy and the 64-bit limit works for you a faster implementation using numpy is:

import numpy as np
def bin2int(bits):
    return np.right_shift(np.packbits(bits, -1), bits.size).squeeze()

Since normally if you are using numpy you care about speed then the fastest implementation for > 64-bit results is:

import gmpy2
def bin2int(bits):
    return gmpy2.pack(list(bits[::-1]), 1)

If you don't want to grab a dependency on gmpy2 this is a little slower but has no dependencies and supports > 64-bit results:

def bin2int(bits):
    total = 0
    for shift, j in enumerate(bits[::-1]):
        if j:
            total += 1 << shift
    return total

The observant will note some similarities in the last version to other Answers to this question with the main difference being the use of the << operator instead of **, in my testing this led to a significant improvement in speed.

def binary_converter(arr):
    total = 0
    for index, val in enumerate(reversed(arr)):
        total += (val * 2**index)
    print total


In [14]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])
In [15]: binary_converter(b)
1285
In [9]: b = np.array([0,0,0,0,0,1,0,1])
In [10]: binary_converter(b)
5

or

b = np.array([1,0,1,0,0,0,0,0,1,0,1])
sum(val * 2**index for index, val in enumerate(reversed(b)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM