在容差內找到Python中兩個矩陣的交集？

Question

我正在尋找最有效的方法來找到兩個不同大小的矩陣的交集。 每個矩陣有三個變量（列）和不同數量的觀察（行）。 例如，矩陣A：

a = np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
b = np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003'; 9 9 3000; 7 7 1000')

如果我將每列的容差設置為col1 = 1 ， col2 = 2和col3 = 10 ，我想要一個函數，使得它將輸出a和b中各自容差范圍內的索引，例如：

[x1, x2] = func(a, b, col1, col2, col3)
print x1
>> [2 3]
print x2
>> [1 3]

您可以通過索引看到， a的元素2在b的元素1的容差范圍內。

我想我可以循環遍歷矩陣a每個元素，檢查它是否在b中每個元素的容差范圍內，並且這樣做。 但對於非常大的數據集來說似乎效率低下。

有關實現此循環方法的替代方法的任何建議嗎？

Answer 1

如果您不介意使用NumPy陣列，則可以利用broadcasting來實現矢量化解決方案。 這是實施 -

# Set tolerance values for each column
tol = [1, 2, 10]

# Get absolute differences between a and b keeping their columns aligned
diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))

# Compare each row with the triplet from `tol`.
# Get mask of all matching rows and finally get the matching indices
x1,x2 = np.nonzero((diffs < tol).all(2))

樣品運行 -

In [46]: # Inputs
    ...: a=np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
    ...: b=np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003; 9 9 3000; 7 7 1000')
    ...: 

In [47]: # Set tolerance values for each column
    ...: tol = [1, 2, 10]
    ...: 
    ...: # Get absolute differences between a and b keeping their columns aligned
    ...: diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
    ...: 
    ...: # Compare each row with the triplet from `tol`.
    ...: # Get mask of all matching rows and finally get the matching indices
    ...: x1,x2 = np.nonzero((diffs < tol).all(2))
    ...: 

In [48]: x1,x2
Out[48]: (array([2, 3]), array([1, 3]))

大型數據集案例：如果您正在處理導致內存問題的大型數據，並且因為您已經知道列數是3 ，那么您可能希望擁有3次迭代的最小循環並節省大量內存，如此 -

na = a.shape[0]
nb = b.shape[0]
accum = np.ones((na,nb),dtype=bool)
for i in range(a.shape[1]):
    accum &=  np.abs((a[:,i] - b[:,i].ravel())) < tol[i]
x1,x2 = np.nonzero(accum)

在容差內找到Python中兩個矩陣的交集？

問題描述

1 個解決方案

解決方案1
5 已采納 2015-11-04 06:14:50

在容差內找到Python中兩個矩陣的交集？

問題描述

1 個解決方案

解決方案1 5 已采納 2015-11-04 06:14:50

解決方案1
5 已采納 2015-11-04 06:14:50