简体   繁体   English

获取 NumPy 数组中元素的索引

[英]get the index of element in NumPy array

I have a Numpy integer array with a lot of duplicate elements.我有一个包含很多重复元素的 Numpy integer 数组。

For example:例如:

a = np.random.randint(0,5,20)
a
Out[23]:
array([3, 1, 2, 4, 1, 2, 4, 3, 2, 3, 1, 4, 4, 1, 2, 4, 2, 4, 1, 1])

There are two cases:有两种情况:

  1. if one element is less than 4, get all the indexes of this element如果一个元素小于4,则获取该元素的所有索引
  2. if one element is great than or equal to 4, select four of them randomly如果一个元素大于或等于 4,则 select 其中四个随机

I solved this with a loop.我用一个循环解决了这个问题。

ans = np.array([])
num = 4
for i in range(1,5):
    indexes = np.where(a == i)[0] # all indexes of elements equal to i
    index_i = np.random.choice(indexes, num, False) if len(indexes) >=num else indexes
    ans = np.concatenate([ans, index_i])

np.sort(ans)

Out[57]:
array([ 0.,  1.,  2.,  5.,  6.,  7.,  8.,  9., 10., 11., 13., 14., 15.,
       17., 19.])

Can I solve this problem without a loop or more efficiently in Numpy or PyTorch?我可以在 Numpy 或 PyTorch 中不使用循环或更有效地解决这个问题吗?

You can do it quite easily, using Pandas .你可以很容易地做到这一点,使用Pandas

First convert your array to a pandasonic Series :首先将您的数组转换为pandasonic Series

s = pd.Series(a)

Then:然后:

  • Group it by its value.按其值对其进行分组。
  • Apply to each group a function, which:向每个组申请一个 function,其中:
    • for groups of size 4 or smaller returns just this group,对于大小为4或更小的组,仅返回该组,
    • for groups with more members, returns a random sample of 4 elements from them.对于具有更多成员的组,从中返回 4 个元素的随机样本。
  • Drop the 0-th level of the resulting index (added during grouping).删除结果索引的第 0级(在分组期间添加)。
  • Sort by the (original) index, to bring back the original order (without the dropped elements, for now we have original values with their corresponding indices).按(原始)索引排序,恢复原始顺序(没有删除的元素,因为现在我们有原始值及其相应的索引)。
  • Return the index of the above result, as a Numpy array.返回上述结果的索引,作为Numpy数组。

The code to do it is:这样做的代码是:

s.groupby(s).apply(lambda grp: grp if grp.size <= 4 else grp.sample(4))\
    .reset_index(level=0, drop=True).sort_index().index.values

For a sample array containg:对于包含以下内容的示例数组:

array([2, 2, 1, 0, 1, 0, 2, 2, 2, 3, 0, 2, 1, 0, 0, 3, 3, 0, 2, 4])

the result is:结果是:

array([ 0,  2,  4,  5,  7,  9, 10, 11, 12, 14, 15, 16, 17, 18, 19])

To show that this result is correct, I repeated the source array, with "x" marks below the elements at the returned indices.为了证明这个结果是正确的,我重复了源数组,并在返回索引处的元素下方添加了“x”标记。

array([2, 2, 1, 0, 1, 0, 2, 2, 2, 3, 0, 2, 1, 0, 0, 3, 3, 0, 2, 4])
       x     x     x  x     x     x  x  x  x     x  x  x  x  x  x

Yes, you can do this using NumPy by:是的,您可以通过以下方式使用 NumPy 执行此操作:

a = np.random.randint(0,10,20)
print(a)

num = 4
if str(np.where(a<num)[0].shape) != '(0,)':             # Condition 1
    ans = np.where(a<num)[0]
    print(ans)
if str(np.where(a>=num)[0].shape) != '(0,)':            # Condition 2
    ans = np.random.choice(a[np.where(a>=num)[0]], 4)
    print(ans)

'''Output:
[9 9 8 1 0 7 7 4 6 2 8 2 1 2 9 5 5 1 4 1]
[ 3  4  9 11 12 13 17 19]
[4 9 8 7]
'''

I have done only for the cases you have mentioned.我只针对你提到的情况做了。 There can be many other cases such as if both conditions are true, or if there are less than 4 numbers in second case.可能还有许多其他情况,例如如果两个条件都为真,或者第二种情况下数字少于 4 个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM