找到numpy數組中的唯一點

Question

什么是在numpy數組中找到唯一x，y點（刪除重復項）的更快方法，如：

points = numpy.random.randint(0, 5, (10,2))

我想過將點轉換為復數然后檢查唯一，但這似乎相當復雜：

b = numpy.unique(points[:,0] + 1j * points[:,1])
points = numpy.column_stack((b.real, b.imag))

Answer 1

我會這樣做：

numpy.array(list(set(tuple(p) for p in points)))

對於最常見情況下的快速解決方案，也許這個方法會讓您感興趣： http ： //code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/

Answer 2

我想你在這里有個好主意。 想想用於表示points數據的底層內存塊。 我們告訴numpy將該塊視為表示具有dtype int32 （32位整數）的形狀（10,2）的數組，但是告訴numpy將相同的內存塊視為表示形狀的數組幾乎是無成本的（ 10，）與dtype c8 （64位復數）。

因此唯一真正的成本是調用np.unique ，然后是另一個幾乎無成本的調用來view和reshape ：

import numpy as np
np.random.seed(1)
points = np.random.randint(0, 5, (10,2))
print(points)
print(len(points))

產量

[[3 4]
 [0 1]
 [3 0]
 [0 1]
 [4 4]
 [1 2]
 [4 2]
 [4 3]
 [4 2]
 [4 2]]
10

而

cpoints = points.view('c8')
cpoints = np.unique(cpoints)
points = cpoints.view('i4').reshape((-1,2))
print(points)
print(len(points))

產量

[[0 1]
 [1 2]
 [3 0]
 [3 4]
 [4 2]
 [4 3]
 [4 4]]
7

如果你不需要對結果進行排序，那么wim的方法會更快（你可能想考慮接受他的答案......）

import numpy as np
np.random.seed(1)
N=10000
points = np.random.randint(0, 5, (N,2))

def using_unique():
    cpoints = points.view('c8')
    cpoints = np.unique(cpoints)
    return cpoints.view('i4').reshape((-1,2))

def using_set():
    return np.vstack([np.array(u) for u in set([tuple(p) for p in points])])

產生這些基准：

% python -mtimeit -s'import test' 'test.using_set()'
100 loops, best of 3: 18.3 msec per loop
% python -mtimeit -s'import test' 'test.using_unique()'
10 loops, best of 3: 40.6 msec per loop

找到numpy數組中的唯一點

問題描述

2 個解決方案

解決方案1
8 已采納 2011-11-03 04:36:10

解決方案2
7 2011-11-03 03:14:26

找到numpy數組中的唯一點

問題描述

2 個解決方案

解決方案1 8 已采納 2011-11-03 04:36:10

解決方案2 7 2011-11-03 03:14:26

解決方案1
8 已采納 2011-11-03 04:36:10

解決方案2
7 2011-11-03 03:14:26