简体   繁体   English

使用具有多个值的索引,如何获得最小的一个

[英]using indices with multiple values, how to get the smallest one

I have an index to choose elements from one array. 我有一个索引可以从一个数组中选择元素。 But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. 但是有时索引可能会重复输入...在这种情况下,我想选择相应的较小值。 Is it possible? 可能吗?

index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr

what I get: 我得到的是:

[ 1.  0.  0.  1.  0.  3.]

what I would like to get: 我想得到什么:

[ 1.  0.  0.  1.  0.  1.]

addendum 附录

Actually I have a third array with the (vector) values to be inserted. 实际上,我有第三个数组,其中要插入(向量)值。 So the problem is to insert values from values into arr at positions index as in the following. 因此,问题在于,将值的values插入到位置index处的arr中,如下所示。 However I want to choose the values corresponding to minimum dist when multiple values have the same index. 不过,我想选择对应最低值dist当多个值具有相同的索引。

index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr

I get: 我得到:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 6.  7.]]

I would like to get: 我想得到:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 4.  5.]]

Use groupby in pandas: 在大熊猫中使用groupby

import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr

If index is sorted, then itertools.groupby could be used to group that list. 如果对index排序,则可以使用itertools.groupby对列表进行分组。

np.array([(g[0],min([x[1] for x in g[1]])) for g in 
    itertools.groupby(zip(index,dist),lambda x:x[0])])

produces 产生

array([[0, 1],
       [3, 1],
       [5, 1]])

This is about 8x slower than the version using np.unique . 这比使用np.unique的版本慢大约8倍。 So for N=1000 is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). 因此,对于N=1000它类似于Pandas的版本(我猜是因为我的Pandas导入有些麻烦)。 For larger N the Pandas version is better. 对于较大的N,Pandas版本更好。 Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N. 似乎Pandas方法的启动成本很高,这限制了小N的运行速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM