简体   繁体   English

numpy数组中的元素顺序

[英]Order of elements in a numpy array

I have a 2-d array of shape(nx3), say arr1. 我有一个二维的形状数组(nx3),比如说arr1。 Now consider a second array, arr2, of same shape as arr1 and has the same rows. 现在考虑第二个数组arr2,其形状与arr1相同,并且具有相同的行。 However, the rows are not in the same order. 但是,行的顺序不同。 I want to get the indices of each row in arr2 as they are in arr1. 我想得到arr2中每行的索引,因为它们在arr1中。 I am looking for fastest Pythonic way to do this as n is of the order of 10,000. 我正在寻找最快的Pythonic方法,因为n大约为10,000。

For example: 例如:

arr1 = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2 = numpy.array([[4, 5, 6], [7, 8, 9], [1, 2, 3]])
ind = [1, 2, 0]

Note that the row elements need not be integers. 请注意,行元素不必是整数。 In fact they are floats. 实际上它们是花车。 I have found related answers that use numpy.searchsorted but they work for 1-D arrays only. 我找到了使用numpy.searchsorted的相关答案,但它们只适用于1-D数组。

If you are ensure that arr2 is a permutation of arr1 , you can use sort to get the index: 如果确保arr2arr1的排列,则可以使用sort来获取索引:

import numpy as np

n = 100000
a1 = np.random.randint(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))]
idx1 = np.lexsort(a1.T)
idx2 = np.lexsort(a2.T)
idx = idx2[np.argsort(idx1)]
np.all(a1 == a2[idx])

if they don't have exact the same values, you can use kdTree in scipy: 如果他们没有完全相同的值,你可以在scipy中使用kdTree:

n = 100000

a1 = np.random.uniform(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))] + np.random.normal(0, 1e-8, size=(n, 3))
from scipy import spatial
tree = spatial.cKDTree(a2)
dist, idx = tree.query(a1)
np.allclose(a1, a2[idx])

Before we begin, you should mention whether duplicates can exist in your list. 在开始之前,您应该提一下列表中是否存在重复项。

That said, the method I would use is numpy's where function within a list comprehension like so: 也就是说,我将使用的方法是numpy的,其中列表理解中的函数如下:

[numpy.where(arr1 == x)[0][0] for x in arr2]

Though this might not be the fastest way. 虽然这可能不是最快的方式。 Another method might include building a dictionary from the rows in arr1 somehow and then looking them up with arr2. 另一种方法可能包括以某种方式从arr1中的行构建字典,然后使用arr2查找它们。

While this is very similar to: Find indexes of matching rows in two 2-D arrays I don't have the reputation to leave a comment. 虽然这非常类似于: 在两个二维数组中查找匹配行的索引我没有留下评论的声誉。

However, based on that comment there appear to be two clear possibilities for a large matrix like yours: 但是,根据该评论,像你这样的大矩阵似乎有两种明显的可能性:

def find_rows_searchsorted(a, b):
    dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))

    a_view = np.ascontiguousarray(a).view(dt).ravel()
    b_view = np.ascontiguousarray(b).view(dt).ravel()

    sort_b = np.argsort(b_view)
    where_in_b = np.searchsorted(b_view, a_view, sorter=sort_b)
    return np.take(sort_b, where_in_b)

def find_rows_iterative(a, b):
    answer = np.empty(a.shape[0], dtype=int)
    for idx, row in enumerate(a):
        answer[idx] = np.where(np.equal(b, row).all(1))[0]

    return answer

def find_rows_list_comprehension(a, b):
    return [np.where(b == x)[0][0] for x in a]

However, a little timing with a matrix of 10000 elements shows that the searchsorted based method is significantly faster than the brute force iterative method: 然而,使用10000个元素矩阵的一点时间表明基于搜索排序的方法明显快于强力迭代方法:

arr1 = np.random.randn(10000, 3)
shuffled_inds = np.arange(arr1.shape[0])
np.random.shuffle(shuffled_inds)
arr2 = arr1[new_inds, :]

np.array_equal(find_rows_searchsorted(arr2, arr1), new_inds)
>> True

np.array_equal(find_rows_iterative(arr2, arr1), new_inds)
>> True

np.array_equal(find_rows_list_comprehension(arr2, arr1), new_inds)
>> True

%timeit find_rows_iterative(arr2, arr1)
>> 1 loops, best of 3: 2.62 s per loop

%timeit find_rows_list_comprehension(arr2, arr1)
>> 1 loops, best of 3: 1.61 s per loop

%timeit find_rows_searchsorted(arr2, arr1)
>> 100 loops, best of 3: 6.53 ms per loop

Based off of HYRY's great responses I also added lexsort and kdball tests as well as a test of argsort for structured arrays. 基于HYRY的出色反应,我还添加了lexsort和kdball测试以及结构化数组的argsort测试。

def find_rows_lexsort(a, b):
    idx1 = np.lexsort(a.T)
    idx2 = np.lexsort(b.T)
    return idx2[np.argsort(idx1)]

def find_rows_argsort(a, b):
    a_rec  = np.core.records.fromarrays(a.transpose())
    b_rec  = np.core.records.fromarrays(b.transpose())
    idx1 = a_rec.argsort(order=a_rec.dtype.names).argsort()
    return b_rec.argsort(order=b_rec.dtype.names)[idx1]

def find_rows_kdball(a, b):
    from scipy import spatial
    tree = spatial.cKDTree(b)
    _, idx = tree.query(a)
    return idx

%timeit find_rows_lexsort(arr2, arr1)
>> 100 loops, best of 3: 4.63 ms per loop

%timeit find_rows_argsort(arr2, arr1)
>> 100 loops, best of 3: 7.37 ms per loop

%timeit find_rows_kdball(arr2, arr1)
>> 100 loops, best of 3: 18.5 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM