[英]Rearrange 2D NumPy Array Efficiently
Let's say I have a 2D NumPy array:假设我有一个二维 NumPy 数组:
x = np.random.rand(100, 100000)
And I retrieve the column-wise sorted indices (ie, each column is sorted independently from the others and the indices are returned):我检索按列排序的索引(即,每列独立于其他列排序并返回索引):
idx = np.argsort(x, axis=0)
Then, for each column, I need the values from indices = [10, 20, 30, 40, 50] to be first the first 5 rows (of that column) and then followed by the rest of the sorted values (not the indices.).然后,对于每一列,我需要 index = [10, 20, 30, 40, 50] 中的值首先是(该列的)前 5 行,然后是排序值的 rest(不是索引.)。
A naive approach might be:一种天真的方法可能是:
indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)
for col in range(x.shape[1]):
# For each column, fill the first few rows with `indices`
out[:indices.shape[0], col] = x[indices, col] # Note that we want the values, not the indices
# Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
n = indices.shape[0]
for row in range(indices.shape[0], x.shape[0]):
if idx[row, col] not in indices:
out[n, col] = x[row, col] # Again, note that we want the value, not the index
n += 1
Approach #1方法#1
Here's one based on previous post
that doesn't need idx
-这是基于上一篇不需要
idx
previous post
-
xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)
Approach #2方法#2
Another with np.isin
masking that uses idx
-另一个使用
idx
的np.isin
掩码 -
mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))
Approach #2- Alternative If you are continously editing into out
to change everything except those indices
, an array-assignment might be the one for you -方法 #2- 替代方案如果您不断编辑
out
以更改除那些indices
之外的所有内容,则数组分配可能适合您-
n = len(indices)
out[:n] = x[indices]
mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower
This should help you get started by eliminating the inner most loop and if
condition.这应该通过消除最内层循环和
if
条件来帮助您开始。 To start off, you can pass in x[:, col]
as the input parameter x
.首先,您可以传入
x[:, col]
作为输入参数x
。
def custom_ordering(x, idx, indices):
# First get only the desired indices at the top
out = x[indices, :]
# delete `indices` from `idx` so `idx` doesn't have the values in `indices`
idx2 = np.delete(idx, indices)
# select `idx2` rows and concatenate
out = np.concatenate((out, x[idx2, :]), axis=0)
return out
Here is my solution to the problem:这是我对问题的解决方案:
rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices] # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0) # the sorted array
out = np.empty(x.shape)
out[:indices.size, :] = xs[indices, :] # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :] # insert the remaining values after the previous
Tell me if I understood your problem correctly.告诉我我是否正确理解了您的问题。
I do this with a smaller array and fewer indices such that I can easily sanity check the results, but it should translate to your use case.我使用较小的数组和较少的索引来执行此操作,以便我可以轻松地检查结果,但它应该转化为您的用例。 I think this solution is decently efficient as everything is done in place.
我认为这个解决方案非常有效,因为一切都已到位。
import numpy as np
x = np.random.randint(10, size=(12,3))
indices = np.array([5,7,9])
# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0)
Here is with the output:这是 output:
>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
[7, 4, 6],
[6, 5, 2],
[6, 8, 4],
[2, 0, 2],
[3, 0, 4], # 5th row
[4, 7, 4],
[3, 1, 1], # 7th row
[3, 5, 3],
[0, 5, 9]]) # 9th row
>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4], #
[3, 1, 1], # Top of array looks like above
[0, 5, 9], #
[6, 8, 4],
[2, 0, 2],
[7, 1, 8], # Previous top row
[4, 7, 4],
[7, 4, 6], # Previous second row
[3, 5, 3],
[6, 5, 2]]) # Previous third row
>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
[3, 1, 1], # Top is the same, below is sorted
[0, 5, 9], #
[2, 0, 2],
[3, 1, 2],
[4, 4, 3],
[6, 5, 4],
[6, 5, 4],
[7, 7, 6],
[7, 8, 8]])
EDIT: This version here should handle if any elements in indices
is less than len(indices)
编辑:如果
indices
中的任何元素小于len(indices)
,则此版本应处理
import numpy as np
x = np.random.randint(10, size=(12,3))
indices = np.array([1,2,4])
tmp = x[indices]
# Here I just assume that there aren't any values less or equal to -1. If you use
# float, you can use -np.inf, but there is no such equivalent for ints (which I
# use in my example).
x[indices] = -1
# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0)
x[indices] = tmp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.