使用具有多个值的索引，如何获得最小的一个

Question

I have an index to choose elements from one array. 我有一个索引可以从一个数组中选择元素。 But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. 但是有时索引可能会重复输入...在这种情况下，我想选择相应的较小值。 Is it possible? 可能吗？

index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr

what I get: 我得到的是：

[ 1.  0.  0.  1.  0.  3.]

what I would like to get: 我想得到什么：

[ 1.  0.  0.  1.  0.  1.]

addendum 附录

Actually I have a third array with the (vector) values to be inserted. 实际上，我有第三个数组，其中要插入（向量）值。 So the problem is to insert values from values into arr at positions index as in the following. 因此，问题在于，将值的values插入到位置index处的arr中，如下所示。 However I want to choose the values corresponding to minimum dist when multiple values have the same index. 不过，我想选择对应最低值dist当多个值具有相同的索引。

index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr

I get: 我得到：

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 6.  7.]]

I would like to get: 我想得到：

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 4.  5.]]

Answer 1

Use groupby in pandas: 在大熊猫中使用groupby ：

import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr

Answer 2

If index is sorted, then itertools.groupby could be used to group that list. 如果对index排序，则可以使用itertools.groupby对列表进行分组。

np.array([(g[0],min([x[1] for x in g[1]])) for g in 
    itertools.groupby(zip(index,dist),lambda x:x[0])])

produces 产生

array([[0, 1],
       [3, 1],
       [5, 1]])

This is about 8x slower than the version using np.unique . 这比使用np.unique的版本慢大约8倍。 So for N=1000 is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). 因此，对于N=1000它类似于Pandas的版本（我猜是因为我的Pandas导入有些麻烦）。 For larger N the Pandas version is better. 对于较大的N，Pandas版本更好。 Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N. 似乎Pandas方法的启动成本很高，这限制了小N的运行速度。

使用具有多个值的索引，如何获得最小的一个

问题描述

2 个解决方案

解决方案1
1 已采纳 2013-12-06 12:37:40

解决方案2
1 2013-12-07 23:21:11

使用具有多个值的索引，如何获得最小的一个

问题描述

2 个解决方案

解决方案1 1 已采纳 2013-12-06 12:37:40

解决方案2 1 2013-12-07 23:21:11

解决方案1
1 已采纳 2013-12-06 12:37:40

解决方案2
1 2013-12-07 23:21:11