简体   繁体   English

特定容器中的numpy数组的元素数

[英]Number of elements of numpy arrays inside specific bins

I have an ensemble of sorted (one-dimensional) arrays of unequal lengths (say M0 , M1 and M2 ). 我有一组长度不等排序 (一维)数组(例如M0M1M2 )。 I want to find how many elements of each of these arrays is inside specific number ranges (where the number ranges are specified by neighboring elements from another sorted array , say zbin ). 我想找出每个数组中有多少个元素在特定的数字范围内 (其中数字范围由另一个排序数组 (例如zbin )中的相邻元素指定)。 I want to know what is the fastest way to achieve this. 我想知道最快的方法是什么。

Here, I am giving a small example of the task that I want to do (and also the method that I am following presently to achieve the desired functionality): 在这里,我给出一个我想做的任务的小例子(以及为实现所需功能而我正在遵循的方法):

""" Function to do search query """
def search(numrange, lst):
    arr = np.zeros(len(lst))        
    for i in range(len(lst)):
        probe = lst[i]
        count = 0
        for j in range(len(probe)):
            if (probe[j]>numrange[1]): break
            if (probe[j]>=numrange[0]) and (probe[j]<=numrange[1]): count = count + 1   

        arr[i] = count
    return arr


""" Some example of sorted one-dimensional arrays of unequal lengths """
M0 = np.array([5.1, 5.4, 6.4, 6.8, 7.9])
M1 = np.array([5.2, 5.7, 8.8, 8.9, 9.1, 9.2])
M2 = np.array([6.1, 6.2, 6.5, 7.2])

""" Implementation and output """
lst = [M0, M1, M2]
zbin = np.array([5.0, 5.5, 6.0, 6.5])
zarr = np.zeros( (len(zbin)-1, len(lst)) )
for i in range(len(zbin)-1):
    numrange = [zbin[i], zbin[i+1]]
    zarr[i,:] = search(numrange, lst)

print zarr

Output: 输出:

[[ 2.  1.  0.]
 [ 0.  1.  0.]
 [ 1.  0.  3.]] 

Here, the final output zarr gives me the number of elements of each of the arrays ( M0 , M1 and M2 ) inside each of the bins possible from zbin ( viz. [5.0, 5.5] , [5.5, 6.0] and [6.0, 6.5] .) For example consider the bin [5.0, 5.5] . 在这里,最终的输出zarr给出了zbin [5.0, 5.5][5.5, 6.0] zbin [5.5, 6.0][6.0, 6.5] [5.5, 6.0]每个bin内每个数组( M0M1M2 )的元素数量[6.0, 6.5] 。)例如,考虑bin [5.0, 5.5] The array M0 has 2 elements inside that bin ( 5.1 and 5.4 ), M1 has 1 element ( 5.2 ) and M2 has 0 elements in that bin. 数组M0在该bin中具有2个元素( 5.15.4 ), M1在该bin中具有1个元素( 5.2 ), M2具有0个元素。 This gives the first row of zarr ie [2,1,0] . 这给出了zarr的第一行,即[2,1,0] One can get the other rows of zarr in a similar manner. 可以以类似的方式获得其他行的zarr

In my actual task, I will be dealing with zbin of lengths much larger than what I have given in this example, and also bigger and many more arrays like M0 , M1 , ... Mn . 在我的实际任务中,我将处理长度比本例中给出的zbin大得多的zbin ,并且还要处理更大,更多的数组,例如M0M1... Mn All M s and the array zbin would be sorted always. 所有M和数组zbin将始终进行排序。 I am wondering if the function that I have designed ( search() ), and the method that I am following are the most optimum and the fastest ways to achieve the desired functionality. 我想知道我设计的函数( search() )和我遵循的方法是否是实现所需功能的最佳方法和最快方法。 I will really appreciate any help. 我将非常感谢您的帮助。

We could make use of the sorted nature and hence use np.searchsorted for this task, like so - 我们可以利用排序的性质,因此可以将np.searchsorted用于此任务,如下所示:

out = np.empty((len(zbin)-1, len(lst)),dtype=int)
for i,l in enumerate(lst):
    left_idx = np.searchsorted(l, zbin[:-1], 'left')
    right_idx = np.searchsorted(l, zbin[1:], 'right')
    out[:,i] = right_idx - left_idx

I would guess it would be difficult to beat the performance of simply looping over each array and calling numpy.histogram. 我想很难超越仅循环访问每个数组并调用numpy.histogram的性能。 I'm guessing you haven't tried this or you'd have mentioned it! 我猜您还没有尝试过,或者您已经提到了!

It's certainly possible that you could exploit the sorted nature to come up with a faster solution, but I'd start by comparing the timing of that. 当然,您可以利用排序的性质来提出更快的解决方案,但是我将首先比较其时间安排。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM