简体   繁体   English

高效重排二维 NumPy 阵列

[英]Rearrange 2D NumPy Array Efficiently

Let's say I have a 2D NumPy array:假设我有一个二维 NumPy 数组:

x = np.random.rand(100, 100000)

And I retrieve the column-wise sorted indices (ie, each column is sorted independently from the others and the indices are returned):我检索按列排序的索引(即,每列独立于其他列排序并返回索引):

idx = np.argsort(x, axis=0) 

Then, for each column, I need the values from indices = [10, 20, 30, 40, 50] to be first the first 5 rows (of that column) and then followed by the rest of the sorted values (not the indices.).然后,对于每一列,我需要 index = [10, 20, 30, 40, 50] 中的值首先是(该列的)前 5 行,然后是排序值的 rest(不是索引.)。

A naive approach might be:一种天真的方法可能是:

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)

for col in range(x.shape[1]):
    # For each column, fill the first few rows with `indices`
    out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices

    # Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
    n = indices.shape[0]
    for row in range(indices.shape[0], x.shape[0]):
        if idx[row, col] not in indices:
            out[n, col] = x[row, col]  # Again, note that we want the value, not the index
            n += 1

Approach #1方法#1

Here's one based on previous post that doesn't need idx -这是基于上一篇不需要idx previous post -

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

Approach #2方法#2

Another with np.isin masking that uses idx -另一个使用idxnp.isin掩码 -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

Approach #2- Alternative If you are continously editing into out to change everything except those indices , an array-assignment might be the one for you -方法 #2- 替代方案如果您不断编辑out以更改除那些indices之外的所有内容,则数组分配可能适合您-

n = len(indices)
out[:n] = x[indices]

mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower

This should help you get started by eliminating the inner most loop and if condition.这应该通过消除最内层循环和if条件来帮助您开始。 To start off, you can pass in x[:, col] as the input parameter x .首先,您可以传入x[:, col]作为输入参数x

def custom_ordering(x, idx, indices):
    # First get only the desired indices at the top
    out = x[indices, :]

    # delete `indices` from `idx` so `idx` doesn't have the values in `indices`
    idx2 = np.delete(idx, indices)

    # select `idx2` rows and concatenate
    out = np.concatenate((out, x[idx2, :]), axis=0)

    return out

Here is my solution to the problem:这是我对问题的解决方案:

rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)

out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

Tell me if I understood your problem correctly.告诉我我是否正确理解了您的问题。

I do this with a smaller array and fewer indices such that I can easily sanity check the results, but it should translate to your use case.我使用较小的数组和较少的索引来执行此操作,以便我可以轻松地检查结果,但它应该转化为您的用例。 I think this solution is decently efficient as everything is done in place.我认为这个解决方案非常有效,因为一切都已到位。

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])

# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0) 

Here is with the output:这是 output:

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
       [7, 4, 6],
       [6, 5, 2],
       [6, 8, 4],
       [2, 0, 2],
       [3, 0, 4],  # 5th row
       [4, 7, 4],
       [3, 1, 1],  # 7th row
       [3, 5, 3],
       [0, 5, 9]]) # 9th row

>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
       [3, 1, 1],
       [0, 5, 9]])

>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
       [3, 1, 1],  # Top of array looks like above
       [0, 5, 9],  #
       [6, 8, 4],
       [2, 0, 2],
       [7, 1, 8],  # Previous top row
       [4, 7, 4],
       [7, 4, 6],  # Previous second row
       [3, 5, 3],
       [6, 5, 2]]) # Previous third row

>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
       [3, 1, 1], # Top is the same, below is sorted
       [0, 5, 9], #
       [2, 0, 2],
       [3, 1, 2],
       [4, 4, 3],
       [6, 5, 4],
       [6, 5, 4],
       [7, 7, 6],
       [7, 8, 8]])

EDIT: This version here should handle if any elements in indices is less than len(indices)编辑:如果indices中的任何元素小于len(indices) ,则此版本应处理

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])

tmp = x[indices]

# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1

# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM