高效重排二维 NumPy 阵列

Question

Let's say I have a 2D NumPy array:假设我有一个二维 NumPy 数组：

x = np.random.rand(100, 100000)

And I retrieve the column-wise sorted indices (ie, each column is sorted independently from the others and the indices are returned):我检索按列排序的索引（即，每列独立于其他列排序并返回索引）：

idx = np.argsort(x, axis=0)

Then, for each column, I need the values from indices = [10, 20, 30, 40, 50] to be first the first 5 rows (of that column) and then followed by the rest of the sorted values (not the indices.).然后，对于每一列，我需要 index = [10, 20, 30, 40, 50] 中的值首先是（该列的）前 5 行，然后是排序值的 rest（不是索引.)。

A naive approach might be:一种天真的方法可能是：

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)

for col in range(x.shape[1]):
    # For each column, fill the first few rows with `indices`
    out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices

    # Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
    n = indices.shape[0]
    for row in range(indices.shape[0], x.shape[0]):
        if idx[row, col] not in indices:
            out[n, col] = x[row, col]  # Again, note that we want the value, not the index
            n += 1

Answer 1

Approach #1方法#1

Here's one based on previous post that doesn't need idx -这是基于上一篇不需要idx previous post -

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

Approach #2方法#2

Another with np.isin masking that uses idx -另一个使用idx的np.isin掩码 -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

Approach #2- Alternative If you are continously editing into out to change everything except those indices , an array-assignment might be the one for you -方法 #2- 替代方案如果您不断编辑out以更改除那些indices之外的所有内容，则数组分配可能适合您-

n = len(indices)
out[:n] = x[indices]

mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower

Answer 2

This should help you get started by eliminating the inner most loop and if condition.这应该通过消除最内层循环和if条件来帮助您开始。 To start off, you can pass in x[:, col] as the input parameter x .首先，您可以传入x[:, col]作为输入参数x 。

def custom_ordering(x, idx, indices):
    # First get only the desired indices at the top
    out = x[indices, :]

    # delete `indices` from `idx` so `idx` doesn't have the values in `indices`
    idx2 = np.delete(idx, indices)

    # select `idx2` rows and concatenate
    out = np.concatenate((out, x[idx2, :]), axis=0)

    return out

Answer 3

Here is my solution to the problem:这是我对问题的解决方案：

rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)

out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

Tell me if I understood your problem correctly.告诉我我是否正确理解了您的问题。

Answer 4

I do this with a smaller array and fewer indices such that I can easily sanity check the results, but it should translate to your use case.我使用较小的数组和较少的索引来执行此操作，以便我可以轻松地检查结果，但它应该转化为您的用例。 I think this solution is decently efficient as everything is done in place.我认为这个解决方案非常有效，因为一切都已到位。

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])

# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0)

Here is with the output:这是 output：

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
       [7, 4, 6],
       [6, 5, 2],
       [6, 8, 4],
       [2, 0, 2],
       [3, 0, 4],  # 5th row
       [4, 7, 4],
       [3, 1, 1],  # 7th row
       [3, 5, 3],
       [0, 5, 9]]) # 9th row

>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
       [3, 1, 1],
       [0, 5, 9]])

>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
       [3, 1, 1],  # Top of array looks like above
       [0, 5, 9],  #
       [6, 8, 4],
       [2, 0, 2],
       [7, 1, 8],  # Previous top row
       [4, 7, 4],
       [7, 4, 6],  # Previous second row
       [3, 5, 3],
       [6, 5, 2]]) # Previous third row

>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
       [3, 1, 1], # Top is the same, below is sorted
       [0, 5, 9], #
       [2, 0, 2],
       [3, 1, 2],
       [4, 4, 3],
       [6, 5, 4],
       [6, 5, 4],
       [7, 7, 6],
       [7, 8, 8]])

EDIT: This version here should handle if any elements in indices is less than len(indices)编辑：如果indices中的任何元素小于len(indices) ，则此版本应处理

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])

tmp = x[indices]

# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1

# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

高效重排二维 NumPy 阵列

问题描述

4 个解决方案

解决方案1
1 2020-05-21 14:22:44

解决方案2
0 2020-05-21 14:24:28

解决方案3
0 2020-05-21 15:09:37

解决方案4
0 2020-05-21 18:18:15

高效重排二维 NumPy 阵列

问题描述

4 个解决方案

解决方案1 1 2020-05-21 14:22:44

解决方案2 0 2020-05-21 14:24:28

解决方案3 0 2020-05-21 15:09:37

解决方案4 0 2020-05-21 18:18:15

解决方案1
1 2020-05-21 14:22:44

解决方案2
0 2020-05-21 14:24:28

解决方案3
0 2020-05-21 15:09:37

解决方案4
0 2020-05-21 18:18:15