简体   繁体   English

根据 2D numpy 数组的第二列中的最大值查找 1D numpy 数组

[英]Finding the 1D numpy array based on the max value in the second column of a 2D numpy array

I'm currently working in removing some 1D arrays based on the values of one of the columns from a 2D array.我目前正在根据二维数组中一列的值删除一些一维数组。 The first column may have different and repeated values, I want to keep one of each repeated value based on the max value of the second column (this is just an example, the 2d array may be bigger) here is what I tried第一列可能有不同的重复值,我想根据第二列的最大值保留每个重复值中的一个(这只是一个例子,二维数组可能更大)这是我试过的

import numpy as np

arr = np.array([[ 36.06, 209.14],
                [ 36.06, 214.55],
                [ 36.06, 215.91],
                [ 36.06, 225.29],
                [ 41.11, 186.76],
                [ 41.11, 191.79],
                [ 41.11, 197.21],
                [ 41.11, 197.33],
                [ 41.11, 201.19],
                [ 41.11, 206.15],
                [ 50.25, 165.51],
                [ 50.25, 174.32],
                [ 59.03, 148.79]])     

biggest = 0
aux = []
for i in range(arr.shape[0]-1):
    j = i+1
    if (arr[i][0] == arr[j][0]):
        if (arr[i][1] < arr[j][1] and arr[j][1] > biggest):
            biggest = j
    if (arr[i][0] != arr[j][0]):
        aux.append(arr[biggest])

print(np.array(aux))

#Output = [[ 36.06 225.29]
#          [ 41.11 206.15]
#          [ 50.25 174.32]]

As you can see, I get almost the desired result, my expected result should be something like this...正如你所看到的,我几乎得到了想要的结果,我的预期结果应该是这样的......

Output = [[ 36.06 225.29]
          [ 41.11 206.15]
          [ 50.25 174.32]
          [ 59.03 148.79]]

The thing is I'm missing the last array and maybe there is an easier way using numpy built-in functions that I'm missing.问题是我缺少最后一个数组,也许有一种更简单的方法可以使用我缺少的 numpy 内置函数。 Thank you in advance!先感谢您!

No reason to reinvent the wheel.没有理由重新发明轮子。 Just use pandas.只需使用熊猫。

import pandas as pd

pd.DataFrame(arr).groupby(0, as_index=False).max().to_numpy()

>> array([[ 36.06, 225.29],
          [ 41.11, 206.15],
          [ 50.25, 174.32],
          [ 59.03, 148.79]])

Alternative选择

The input seems sorted in both columns, meaning the highest value per key is always the last.输入似乎在两列中排序,这意味着每个键的最高值始终是最后一个。 If that is the case, or if it can be accomplished by sorting, a plain numpy version is also possible.如果是这种情况,或者可以通过排序来完成,那么简单的 numpy 版本也是可能的。

# if not already sorted, sort as described above
sorted_array = arr[np.lexsort((arr[:, 1], arr[:, 0]))]
# find the last value per key
keys = sorted_array[:, 0]
ends = np.append(keys[1:] != keys[:-1], True)
# extract rows
return sorted_array[ends]

If we include the cost of sorting, this has a higher computational complexity than the pandas version (assuming the pandas version uses hash tables; haven't checked) Shape of the data and quality of the implementation may change actual runtime.如果我们包括排序成本,这比 pandas 版本具有更高的计算复杂度(假设 pandas 版本使用哈希表;尚未检查)数据的形状和实现的质量可能会改变实际运行时间。

one way is to apply np.unique on the first column to find the unique values in that column (note np.unique will get unique values in sorted scheme in default which is working on your example), then check the maximum value index in the second column for each of that unique values and append to your list:一种方法是在第一列上应用np.unique以查找该列中的唯一值(注意np.unique将在默认的排序方案中获得唯一值,这适用于您的示例),然后检查最大值索引每个唯一值的第二列并附加到您的列表中:

aux = []
for i in np.unique(arr[:, 0]):
    arr_ = arr[arr[:, 0] == i]
    aux.append(arr_[arr_[:, 1].argmax()])

or using arrays instead list appending:或使用数组代替列表附加:

uniques_ = np.unique(arr[:, 0])
# [36.06 41.11 50.25 59.03]

result = np.empty((uniques_.shape[0], arr.shape[1]))
for i, j in enumerate(uniques_):
    arr_ = arr[arr[:, 0] == j]
    result[i] = arr_[arr_[:, 1].argmax()]

# result
# [[ 36.06 225.29]
#  [ 41.11 206.15]
#  [ 50.25 174.32]
#  [ 59.03 148.79]]

to preserve orderings of the first column using np.unique , if we have:使用np.unique保留第一列的顺序,如果我们有:

arr = np.array([[ 41.11, 186.76],
                [ 41.11, 191.79],
                [ 41.11, 197.21],
                [ 41.11, 197.33],
                [ 41.11, 201.19],
                [ 41.11, 206.15],
                [ 36.06, 209.14],
                [ 36.06, 214.55],
                [ 36.06, 215.91],
                [ 36.06, 225.29],
                [ 50.25, 165.51],
                [ 50.25, 174.32],
                [ 59.03, 148.79]])

_, idx = np.unique(arr[:, 0], return_index=True)
uniques_ = arr[:, 0][np.sort(idx)]
result = np.empty((uniques_.shape[0], arr.shape[1]))
for i, j in enumerate(uniques_):
    arr_ = arr[arr[:, 0] == j]
    result[i] = arr_[arr_[:, 1].argmax()]

# result
# [[ 41.11 206.15]
#  [ 36.06 225.29]
#  [ 50.25 174.32]
#  [ 59.03 148.79]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM