[英]Python: convert numpy array of signs to int and back
I'm trying to convert from a numpy array of signs (ie, a numpy array whose entries are either 1.
or -1.
) to an integer and back through a binary representation. 我试图从一个numpy数组的符号(即,一个nduy数组,其条目为
1.
或-1.
)转换为整数,然后通过二进制表示。 I have something that works, but it's not Pythonic, and I expect it'll be slow. 我有一些有用的东西,但它不是Pythonic,我希望它会很慢。
def sign2int(s):
s[s==-1.] = 0.
bstr = ''
for i in range(len(s)):
bstr = bstr + str(int(s[i]))
return int(bstr, 2)
def int2sign(i, m):
bstr = bin(i)[2:].zfill(m)
s = []
for d in bstr:
s.append(float(d))
s = np.array(s)
s[s==0.] = -1.
return s
Then 然后
>>> m = 4
>>> s0 = np.array([1., -1., 1., 1.])
>>> i = sign2int(s0)
>>> print i
11
>>> s = int2sign(i, m)
>>> print s
[ 1. -1. 1. 1.]
I'm concerned about (1) the for loops in each and (2) having to build an intermediate representation as a string. 我担心(1)每个中的for循环和(2)必须将中间表示构建为字符串。
Ultimately, I will want something that works with a 2-d numpy array, too---eg, 最终,我会想要一些适用于2-d numpy数组的东西---例如,
>>> s = np.array([[1., -1., 1.], [1., 1., 1.]])
>>> print sign2int(s)
[5, 7]
For 1d arrays you can use this one linear Numpythonic approach, using np.packbits
: 对于1d数组,您可以使用这一个线性Numpythonic方法,使用
np.packbits
:
>>> np.packbits(np.pad((s0+1).astype(bool).astype(int), (8-s0.size, 0), 'constant'))
array([11], dtype=uint8)
And for reversing: 并且为了倒车:
>>> unpack = (np.unpackbits(np.array([11], dtype=np.uint8))[-4:]).astype(float)
>>> unpack[unpack==0] = -1
>>> unpack
array([ 1., -1., 1., 1.])
And for 2d array: 对于2D阵列:
>>> x, y = s.shape
>>> np.packbits(np.pad((s+1).astype(bool).astype(int), (8-y, 0), 'constant')[-2:])
array([5, 7], dtype=uint8)
And for reversing: 并且为了倒车:
>>> unpack = (np.unpackbits(np.array([5, 7], dtype='uint8'))).astype(float).reshape(x, 8)[:,-y:]
>>> unpack[unpack==0] = -1
>>> unpack
array([[ 1., -1., 1.],
[ 1., 1., 1.]])
I'll start with sig2int
.. Convert from a sign representation to binary 我将从
sig2int
开始。从符号表示转换为二进制
>>> a
array([ 1., -1., 1., -1.])
>>> (a + 1) / 2
array([ 1., 0., 1., 0.])
>>>
Then you can simply create an array of powers of two, multiply it by the binary and sum. 然后你可以简单地创建一个2的幂数组,乘以二进制和和。
>>> powers = np.arange(a.shape[-1])[::-1]
>>> np.power(2, powers)
array([8, 4, 2, 1])
>>> a = (a + 1) / 2
>>> powers = np.power(2, powers)
>>> a * powers
array([ 8., 0., 2., 0.])
>>> np.sum(a * powers)
10.0
>>>
Then make it operate on rows by adding axis information and rely on broadcasting. 然后通过添加轴信息使其在行上运行并依赖于广播。
def sign2int(a):
# powers of two
powers = np.arange(a.shape[-1])[::-1]
np.power(2, powers, powers)
# sign to "binary" - add one and divide by two
np.add(a, 1, a)
np.divide(a, 2, a)
# scale by powers of two and sum
np.multiply(a, powers, a)
return np.sum(a, axis = -1)
>>> b = np.array([a, a, a, a, a])
>>> sign2int(b)
array([ 11., 11., 11., 11., 11.])
>>>
I tried it on a 4 by 100 bit array and it seemed fast 我在一个4乘100比特的阵列上尝试了它,看起来很快
>>> a = a.repeat(100)
>>> b = np.array([a, a, a, a, a])
>>> b
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> sign2int(b)
array([ 2.58224988e+120, 2.58224988e+120, 2.58224988e+120,
2.58224988e+120, 2.58224988e+120])
>>>
I'll add the reverse if i can figure it. 如果我能算出来,我会加上相反的。 - the best I could do relies on some plain Python without any numpy vectoriztion magic and I haven't figured how to make it work with a sequence of ints other than to iterate over them and convert them one at a time - but the time still seems acceptable.
- 我能做的最好的事情依赖于一些普通的Python而没有任何numpy vectoriztion魔术,我还没有想过如何让它与一系列的int一起工作,而不是迭代它们并一次转换它们 - 但时间仍然似乎可以接受
def foo(n):
'''yields bits in increasing powers of two
bit sequence from lsb --> msb
'''
while n > 0:
n, r = divmod(n, 2)
yield r
def int2sign(n):
n = int(n)
a = np.fromiter(foo(n), dtype = np.int8, count = n.bit_length())
np.multiply(a, 2, a)
np.subtract(a, 1, a)
return a[::-1]
Works on 1324: 适用于1324:
>>> bin(1324)
'0b10100101100'
>>> a = int2sign(1324)
>>> a
array([ 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1], dtype=int8)
Seems to work with 1.2e305: 似乎与1.2e305一起使用:
>>> n = int(1.2e305)
>>> n.bit_length()
1014
>>> a = int2sign(n)
>>> a.shape
(1014,)
>>> s = bin(n)
>>> s = s[2:]
>>> all(2 * int(x) -1 == y for x, y in zip(s, a))
True
>>>
Here are some vectorized versions of your functions: 以下是您的函数的一些矢量化版本:
def sign2int(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def int2sign(i, m):
tmp = np.array(list(bin(i)[2:].zfill(m)))
return np.where(tmp == "0", "-1", tmp).astype(int)
s0 = np.array([1., -1., 1., 1.])
sign2int(s0)
# 11
int2sign(11, 5)
# array([-1, 1, -1, 1, 1])
To use your functions on 2-d arrays, you can use map
function: 要在二维数组上使用函数,可以使用
map
函数:
s = np.array([[1., -1., 1.], [1., 1., 1.]])
map(sign2int, s)
# [5, 7]
map(lambda x: int2sign(x, 4), [5, 7])
# [array([-1, 1, -1, 1]), array([-1, 1, 1, 1])]
After a bit of testing, the Numpythonic approach of @wwii that doesn't use strings seems to fit what I need best. 经过一些测试后,不使用字符串的@wwii的Numpythonic方法似乎符合我最需要的方法。 For the
int2sign
, I used a for-loop over the exponents with a standard algorithm for the conversion---which will have at most 64 iterations for 64-bit integers. 对于
int2sign
,我在指数上使用for循环,使用标准算法进行转换---对于64位整数,最多有64次迭代。 Numpy's broadcasting happens across each integer very efficiently. Numpy的广播非常有效地发生在每个整数上。
packbits
and unpackbits
are restricted to 8-bit integers; packbits
和unpackbits
限制为8位整数; otherwise, I suspect that would've been the best (though I didn't try). 否则,我怀疑这将是最好的(虽然我没有尝试)。
Here are the specific implementations I tested that follow the suggestions in the other answers (thanks to everyone!): 以下是我测试的具体实现,遵循其他答案中的建议(感谢大家!):
def _sign2int_str(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def sign2int_str(s):
return np.array(map(_sign2int_str, s))
def _int2sign_str(i, m):
tmp = np.array(list(bin(i)[2:])).astype(int)
return np.pad(np.where(tmp == 0, -1, tmp), (m - len(tmp), 0), "constant", constant_values = -1)
def int2sign_str(i,m):
return np.array(map(lambda x: _int2sign_str(x, m), i.astype(int).tolist())).transpose()
def sign2int_np(s):
p = np.arange(s.shape[-1])[::-1]
s = s + 1
return np.sum(np.power(s, p), axis = -1).astype(int)
def int2sign_np(i,m):
N = i.shape[-1]
S = np.zeros((m, N))
for k in range(m):
b = np.power(2, m - 1 - k).astype(int)
S[k,:] = np.divide(i.astype(int), b).astype(float)
i = np.mod(i, b)
S[S==0.] = -1.
return S
And here is my test: 这是我的测试:
X = np.sign(np.random.normal(size=(5000, 20)))
N = 100
t = time.time()
for i in range(N):
S = sign2int_np(X)
print 'sign2int_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
S = sign2int_str(X)
print 'sign2int_str: \t{:10.8f} sec'.format((time.time() - t)/N)
m = 20
S = np.random.randint(0, high=np.power(2,m), size=(5000,))
t = time.time()
for i in range(N):
X = int2sign_np(S, m)
print 'int2sign_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
X = int2sign_str(S, m)
print 'int2sign_str: \t{:10.8f} sec'.format((time.time() - t)/N)
This produced the following results: 这产生了以下结果:
sign2int_np: 0.00165325 sec
sign2int_str: 0.04121902 sec
int2sign_np: 0.00318024 sec
int2sign_str: 0.24846984 sec
I think numpy.packbits
is worth another look. 我觉得
numpy.packbits
值得再看看。 Given a real-valued sign array a
, you can use numpy.packbits(a > 0)
. 给定一个实值符号数组
a
,您可以使用numpy.packbits(a > 0)
。 Decompression is done by numpy.unpackbits
. 解压缩由
numpy.unpackbits
完成。 This implicitly flattens multi-dimensional arrays so you'll need to reshape
after unpackbits
if you have a multi-dimensional array. 这会隐式地展平多维数组,因此如果您有多维数组,则需要在
unpackbits
后reshape
unpackbits
。
Note that you can combine bit packing with conventional compression (eg, zlib or lzma). 请注意,您可以将位打包与常规压缩(例如,zlib或lzma)结合使用。 If there is a pattern or bias to your data, you may get a useful compression factor, but for unbiased random data, you'll typically see a moderate size increase.
如果您的数据存在模式或偏差,则可能会获得有用的压缩因子,但对于无偏差的随机数据,您通常会看到适度的大小增加。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.