简体   繁体   English

将bincount应用于2D numpy数组的每一行

[英]Apply bincount to each row of a 2D numpy array

Is there a way to apply bincount with "axis = 1"? 有没有办法用“轴= 1”应用bincount The desired result would be the same as the list comprehension: 期望的结果与列表理解相同:

import numpy as np
A = np.array([[1,0],[0,0]])
np.array([np.bincount(r,minlength = np.max(A) + 1) for r in A])

#array([[1,1]
#       [2,0]])

np.bincount doesn't work with a 2D array along a certain axis. np.bincount不适用于沿某个轴的2D数组。 To get the desired effect with a single vectorized call to np.bincount , one can create a 1D array of IDs such that different rows would have different IDs even if the elements are the same. 为了通过对np.bincount的单个矢量化调用来获得期望的效果,可以创建ID的一维数组,使得即使元素相同,不同的行也将具有不同的ID。 This would keep elements from different rows not binning together when using a single call to np.bincount with those IDs. 当使用对这些ID的np.bincount单次调用时,这将使来自不同行的元素不会合并在一起。 Thus, such an ID array could be created with an idea of linear indexing in mind, like so - 因此,这样的ID数组可以创建一个linear indexing的想法,就像这样 -

N = A.max()+1
id = A + (N*np.arange(A.shape[0]))[:,None]

Then, feed the IDs to np.bincount and finally reshape back to 2D - 然后,将ID提供给np.bincount ,最后重塑为2D -

np.bincount(id.ravel(),minlength=N*A.shape[0]).reshape(-1,N)

If the data is too large for this to be efficient, then the issue is more likely to be the memory usage of the dense matrix rather than the numerical operations themself. 如果数据太大而不能使其高效,那么问题更可能是密集矩阵的内存使用而不是数值运算本身。 Here is an example of using a sklearn Hashing Vectorizer on a matrix which is too large to use the bincounts method (the results are a sparse matrix): 下面是一个在矩阵上使用sklearn Hashing Vectorizer的示例,该矩阵太大而无法使用bincounts方法(结果是稀疏矩阵):

import numpy as np
from sklearn.feature_extraction.text import HashingVectorizer
h = HashingVectorizer()
A = np.random.randint(100,size=(1000,100))*10000
A_str = [" ".join([str(v) for v in i]) for i in A]

%timeit h.fit_transform(A_str)
#10 loops, best of 3: 110 ms per loop

You can use apply_along_axis , Here is an example 你可以使用apply_along_axis ,这是一个例子

import numpy as np
test_array = np.array([[0, 0, 1], [0, 0, 1]])
print(test_array)
np.apply_along_axis(np.bincount, axis=1, arr= test_array,
                                          minlength = np.max(test_array) +1)

Note the final shape of this array depends on the number of bins, also you can specify other arguments along with apply_along_axis 请注意,此数组的最终形状取决于bin的数量,您还可以指定其他参数以及apply_along_axis

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM