简体   繁体   English

numpy:在排序列表中,找到每个唯一值的第一个和最后一个索引

[英]numpy: in a sorted list, find the first and the last index for each unique value

Having a sorted list, how can anyone find (using numpy) the first and the last index for each unique value? 有一个排序列表,谁能找到(使用numpy)每个唯一值的第一个和最后一个索引?

Example: 例:

Initial sorted list: 初始排序列表:

>>> import numpy as np
>>> initial_list = np.array([1, 3, 2, 3, 0, 3, 0, 1, 0])
>>> initial_list.sort()

>>> initial_list
array([0, 0, 0, 1, 1, 2, 3, 3, 3])

The result of this would be: 结果将是:

first: [ 0, 0, 0, 3, 3, 5, 6, 6, 6 ] 第一:[0,0,0,3,3,5,6,6,6]

last: [ 2, 2, 2, 4, 4, 5, 8, 8, 8 ] 最后:[2,2,2,4,4,5,8,8,8]

Thank you in advance 先感谢您

Here's one approach leveraging the sorted nature of input data, making use of the very efficient NumPy array-slicing and other NumPy functions - 这是一种利用输入数据的排序特性的方法,利用非常有效的NumPy array-slicing和其他NumPy函数 -

def start_stop_arr(initial_list):
    a = np.asarray(initial_list)
    mask = np.concatenate(([True], a[1:] != a[:-1], [True]))
    idx = np.flatnonzero(mask)
    l = np.diff(idx)
    start = np.repeat(idx[:-1], l)
    stop = np.repeat(idx[1:]-1, l)
    return start, stop

Further performance boost is possible with concatenated repetitions - 通过连续重复可以进一步提高性能 -

def start_stop_arr_concat_repeat(initial_list):
    a = np.asarray(initial_list)
    mask = np.concatenate(([True], a[1:] != a[:-1], [True]))
    idx = np.flatnonzero(mask)
    l = np.diff(idx)
    idx2 = np.concatenate((idx[:-1,None], (idx[1:,None]-1)),axis=1)
    ss = np.repeat(idx2, l, axis=0)
    return ss[:,0], ss[:,1]

Sample run - 样品运行 -

In [38]: initial_list
Out[38]: array([0, 0, 0, 1, 1, 2, 3, 3, 3])

In [39]: start_stop_arr(initial_list)
Out[39]: (array([0, 0, 0, 3, 3, 5, 6, 6, 6]), array([2, 2, 2, 4, 4, 5, 8, 8, 8]))

Runtime test - 运行时测试 -

Other approach(es) - 其他方法 -

# @Mohammed Elmahgiubi's soln
def reversed_app(initial_list): # input expected is a list
    reversed_initial_list = list(reversed(initial_list))
    first = [initial_list.index(i) for i in initial_list]
    last = list(reversed([(len(initial_list) - 
                           (reversed_initial_list.index(i) + 1)) 
                            for i in reversed_initial_list]))
    return first, last

def unique_app(a): # @B. M.'s soln
    _,ind1,inv1,cou1 = np.unique(a, return_index=True, return_inverse=True, 
                                 return_counts=True)
    return ind1[inv1],(ind1+cou1-1)[inv1]

Timings - 计时 -

Case #1 : Smaller dataset 案例#1:较小的数据集

In [295]: initial_list = np.random.randint(0,1000,(10000))
     ...: initial_list.sort()

In [296]: input_list = initial_list.tolist()

In [297]: %timeit reversed_app(input_list)
1 loop, best of 3: 789 ms per loop

In [298]: %timeit unique_app(initial_list)
1000 loops, best of 3: 353 µs per loop

In [299]: %timeit start_stop_arr(initial_list)
10000 loops, best of 3: 96.3 µs per loop

Case #2 : Bigger dataset 案例#2:更大的数据集

In [438]: initial_list = np.random.randint(0,100000,(1000000))
     ...: initial_list.sort()

In [439]: %timeit unique_app(initial_list) # @B. M.'s soln
10 loops, best of 3: 53 ms per loop

In [440]: %timeit start_stop_arr(initial_list)
100 loops, best of 3: 9.64 ms per loop

In [441]: %timeit start_stop_arr_concat_repeat(initial_list)
100 loops, best of 3: 6.76 ms per loop

This is my approach: 这是我的方法:

initial_list = [0, 0, 0, 1, 1, 2, 3, 3, 3]
reversed_initial_list = list(reversed(initial_list))

first = [initial_list.index(i) for i in initial_list]
last = list(reversed([(len(initial_list) - (reversed_initial_list.index(i) + 1)) for i in reversed_initial_list]))

print("initial_list = {}\nfirst = {}\nlast = {}".format(initial_list, first, last))

results in : 结果是 :

initial_list = [0, 0, 0, 1, 1, 2, 3, 3, 3]
first = [0, 0, 0, 3, 3, 5, 6, 6, 6]
last = [2, 2, 2, 4, 4, 5, 8, 8, 8]
first = [np.min(np.where(initial_list==i)) for i in initial_list]
last = [np.max(np.where(initial_list==i)) for i in initial_list]

reference: Is there a Numpy function to return the first index of something in an array? reference: 是否有Numpy函数返回数组中某个东西的第一个索引?

numpy.unique computes all the useful values for that : numpy.unique计算所有有用的值:

a=np.array([0, 0, 0, 1, 1, 2, 3, 3, 3])
_,ind1,inv1,cou1 = np.unique(a, return_index=True, return_inverse=True, return_counts=True)

print(ind1[inv1],(ind1+cou1-1)[inv1])

#[0 0 0 3 3 5 6 6 6] [2 2 2 4 4 5 8 8 8]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM