简体   繁体   English

使用另一个数组的值作为索引填充一个数组。 如果索引重复,则根据并行数组确定优先级

[英]Fill an array using the values of another array as the indices. If an index is repeated, prioritize according to a parallel array

Description描述

I have an array a with N integer elements that range from 0 to M-1.我有一个数组a ,其中包含 N 个 integer 个元素,范围从 0 到 M-1。 I have another array b with N positive numbers.我有另一个数组b有 N 个正数。

Then, I want to create an array c with M elements.然后,我想创建一个包含 M 个元素的数组c The i-th element of c should the index of a that has a value of i. c的第 i 个元素应该是a的索引,其值为 i。

  • If more than one of these indices existed, then we take the one with a higher value in b .如果存在不止一个这些指标,那么我们将采用b中具有更高值的那个。
  • If none existed, the i-th element of c should be -1.如果不存在,则c的第 i 个元素应为 -1。

Example例子

N = 5, M = 3 N = 5,M = 3

a = [2, 1, 1, 2, 2]
b = [1, 3, 5, 7, 3]

Then, c should be...那么,c应该是...

c = [-1, 2, 3]

My Solution 1我的解决方案 1

A possible approach would be to initialize an array d that stores the current max and then loop through a and b updating the maximums.一种可能的方法是初始化存储当前最大值的数组d ,然后循环遍历ab更新最大值。

c = -np.ones(M)
d = np.zeros(M)
for i, (idx, val) in enumerate(zip(a, b)):
    if d[idx] <= val:
        c[idx] = i
        d[idx] = val

This solution is O(N) in time but requires iterating the array with Python, making it slow.此解决方案的时间复杂度为 O(N),但需要使用 Python 迭代数组,因此速度较慢。

My Solution 2我的解决方案 2

Another solution would be to sort a using b as the key.另一种解决方案是使用b作为键对a进行排序。 Then, we can just assign a indices to c (max elements will be last).然后,我们可以将索引分配a c (最大元素将排在最后)。

sort_idx = np.argsort(b)

a_idx = np.arange(len(a))
a = a[sort_idx]
a_idx = a_idx[sort_idx]

c = -np.ones(M)
c[a] = a_idx

This solution does not require Python loops but requires sorting b , making it O(N*log(N)).此解决方案不需要 Python 循环,但需要对b进行排序,使其成为 O(N*log(N))。

Ideal Solution理想的解决方案

Is there a solution to this problem in linear time without having to loop the array in Python?有没有在线性时间内解决这个问题而不必在 Python 中循环数组的方法?

AFAIK, this cannot be implemented in O(n) currently with Numpy (mainly because the index table is both read and written regarding the value of another array). AFAIK,这目前无法在O(n)中使用 Numpy 实现(主要是因为索引表是关于另一个数组的值读取和写入的)。 Note that np.argsort(b) can theoretically be implemented in O(n) using a radix sort, but such sort is not implemented yet in Numpy (it would not be much faster in practice due to the bad cache locality of the algorithm on big arrays).请注意, np.argsort(b)理论上可以使用基数排序在O(n)中实现,但在 Numpy 中尚未实现这种排序(由于算法在大数组)。

One solution is to use Numba to speed up your algorithmically-efficient solution.一种解决方案是使用Numba来加速您的算法高效解决方案。 Numba uses a JIT compiler to speed up loops. Numba 使用 JIT 编译器来加速循环。 Here is an example (working with np.int32 types):这是一个示例(使用np.int32类型):

import numpy as np
import numba as nb

@nb.njit('int32[:](int32[:], int32[:])')
def compute(a, b):
    c = np.full(M, -1, dtype=np.int32)
    d = np.zeros(M, dtype=np.int32)
    for i, (idx, val) in enumerate(zip(a, b)):
        if d[idx] <= val:
            c[idx] = i
            d[idx] = val
    return c

a = np.array([2, 1, 1, 2, 2], dtype=np.int32)
b = np.array([1, 3, 5, 7, 3], dtype=np.int32)
c = compute(a, b)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM