![](/img/trans.png)
[英]How do I remove loop for numpy subtraction of 2d and 3d arrays?
[英]How do I use numpy to find the unions between 2D and 3D arrays?
我有一個長度為3(x)的整數元組的numpy數組和一個長度為2(y)的整數元組的numpy數組。
x = numpy.array([[3, 4, 5], [5, 12, 13], [6, 8, 10], [7, 24, 25]]) #first 4 elem
y = numpy.array([[3, 4], [4, 5], [3, 5], [5, 12]]) # first 4 elem
我試圖比較數組y中的元素:[a,b] 和 [b,c] 和 [a,c],它們是數組x中單個元素[a,b,c]的子集。 我叫這個函數聯合。 我找到聯合的循環方法如下所示。 這對於包含200K最小元素的數組來說並不算太好。
def union(x, y):
for intx in range (len(x)):
cond1 = cond2 = cond3 = 0
for inty in range (len(y)):
if (y[inty][0] == x[intx][0] and y[inty][1] == x[intx][1]): #[a, b] & [a, b, c]
print ("condition 1 passed")
cond1 = 1
if (y[inty][0] == x[intx][1] and y[inty][1] == x[intx][2]): #[b, c] & [a, b, c]
print ("condition 2 passed")
cond2 = 1
if (y[inty][0] == x[intx][0] and y[inty][1] == x[intx][2]): #[a, c] & [a, b, c]
print ("condition 3 passed")
cond3 = 1
if (cond1 & cond2 & cond3):
print("union found with ", x[intx])
cond1 = cond2 = cond3 = 0
return
>>> union(x,y)
condition 1 passed
condition 2 passed
condition 3 passed
union found with [3 4 5]
condition 1 passed
更新#1:示例1:這組x和y沒有聯合:
x = numpy.array([[21, 220, 221]])
y = numpy.array([[21, 220], [20, 21], [220,3021], [1220,3621], [60,221]])
更新#2:示例2:這組x和y沒有聯合:
x = numpy.array([[43, 924, 925]])
y = numpy.array([[43, 924], [924, 1643], [924,4307], [72, 925]])
例3:這是一組x和y,它們的聯合為[4,8,16]。
x = numpy.array([[4, 8, 16], [8, 4, 16]])
y = numpy.array([[4, 8], [8, 16], [4, 16]])
例4:這是一組x和y,其聯合為[12,14,15]。
x = numpy.array([[12, 13, 15], [12, 14, 15]])
y = numpy.array([[12, 14], [12, 13], [12, 15], [14, 15]])
簡介:一般來說,數組x和y將具有[a,b,c]的並集if
x = numpy.array([[a, b, c], ...])
y = numpy.array([[a, b], [b, c], [a, c],...])
或y中的隨機排序
y = numpy.array([[...[b, c], [a, c], ... [a, b]])
所以我的問題:是否有一種簡單的方法來進行數組操作? 例如,numpy.logical_並建議x1和x2必須是相同的形狀。 用isdisjoint替換我的if語句並不簡單,這是一種更快的方法。 https://stackoverflow.com/a/24478521/8275288
如果您只對符合條件的x
“行”感興趣,可以使用:
import numpy as np
def union(x, y):
# Create a boolean mask for the columns of "x"
res = np.ones(x.shape[0], dtype=bool)
# Mask containing the "x" rows that have one "partial match"
res_tmp = np.zeros(x.shape[0], dtype=bool)
# Walk through the axis-combinations
# you could also use Divakars "(x[:,:2], x[:,::2], x[:,1:])" here.
for cols in (x[:, [0, 1]], x[:, [1, 2]], x[:, [0, 2]]):
# Check each row of y if it has a partial match
for y_row in y:
res_tmp |= (y_row == cols).all(axis=1)
# Update the overall mask and then reset the partial match mask
res &= res_tmp
res_tmp[:] = 0
return res
x = np.array([[3, 4, 5], [5, 12, 13], [6, 8, 10], [7, 24, 25]])
y = np.array([[3, 4], [4, 5], [3, 5], [5, 12]])
mask = union(x, y)
print(mask) # array([ True, False, False, False], dtype=bool)
print(x[mask]) # array([[3, 4, 5]])
或者換一個y
:
y = np.array([[3, 4], [4, 5], [3, 5], [5, 12], [12, 13], [5, 13]])
mask = union(x, y)
print(x[mask])
# array([[ 3, 4, 5],
# [ 5, 12, 13]])
它仍然需要循環兩次,但內部操作y_row == x[:, ax]
是矢量化的。 這應該至少帶來一些(可能是巨大的)速度提升。
也可以for y_row in y
循環(使用廣播)中for y_row in y
矢量化,但是如果你的x
數組和y
非常大,這將不會提高性能,但會使用len(x) * len(y)
內存(在某些情況下)這可能需要比實際更多的內存 - 導致異常或性能非常差,因為你回退到交換內存)。
numpy_indexed包(免責聲明:我是它的作者)可用於創建原始代碼的相當簡單的矢量化版本,這應該更有效:
from functools import reduce
import numpy_indexed as npi
def contains_union(x, y):
"""Returns an ndarray with a bool for each element in x,
indicating if it can be constructed as a union of elements in y"""
idx = [[0, 1], [1, 2], [0, 2]]
y = npi.as_index(y) # not required, but a performance optimization
return reduce(np.logical_and, (npi.in_(x[:, i], y) for i in idx))
如果你的x值最大小於最大整數表示的sqrt(使用int 64?),那么數字技巧可能有效
我使用int(1e6)作為一個可讀的例子
import numpy
#rolled up all of the examples
x = numpy.array([[3, 4, 5], [5, 12, 13], [6, 8, 10], [7, 24, 25],
[21, 220, 221],
[43, 924, 925],
[4, 8, 16], [8, 4, 16],
[12, 13, 15], [12, 14, 15]]) #all examples
#and a numpy array of integer tuples of length 2:
y = numpy.array([[3, 4], [4, 5], [3, 5], [5, 12],
[21, 220], [20, 21], [220,3021], [1220,3621], [60,221],
[43, 924], [924, 1643], [924,4307], [72, 925],
[4, 8], [8, 16], [4, 16],
[12, 14], [12, 13], [12, 15], [14, 15]]) #all examples
#then make a couple of transform arrays
zx=numpy.array([[int(1e6), 1, 0],
[0, int(1e6), 1],
[int(1e6), 0, 1],
])
zy = numpy.array([[int(1e6)], [1]])
# and the magic is: np.intersect1d(zx @ x[ix], y @ zy)
# just to see part of what is being fed to intersect1d
print(zx @ x[0])
[3000004 4000005 3000005]
print(y[:4] @ zy)
[[3000004]
[4000005]
[3000005]
[5000012]]
# if len of the intersection == 3 then you have your match
y_zy = y @ zy # only calc once
for ix in range(len(x)):
matches = len(np.intersect1d(zx @ x[ix], y_zy))
print(ix, matches, x[ix] if matches == 3 else '')
0 3 [3 4 5]
1 2
2 0
3 0
4 1
5 1
6 3 [ 4 8 16]
7 2
8 2
9 3 [12 14 15]
我不知道intersect1d
速度,但根據文檔,如果可以設置unique=True
標志,它可以改進,取決於你的數據
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.