[英]How to efficiently calculate score = dot(a, LeakyReLU(x_i+y_j)) for each i, j in [N]?
我必須為 [N] 中的每個 i, j 計算score = dot(a, LeakyReLU(x_i+y_j)) ,其中 a, x_i, y_j 是 D 維向量,dot() 是點積輸出一個標量值。 所以最后,我必須獲得 NxN 分數。
在 keras 中,我實現為:
#given X (N x D), Y(N x D), A (D x 1)
X = tf.expand_dims(X, axis=1) #(N x 1 x D)
Y = tf.expand_dims(Y, axis=0) #(1 x N x D)
feature_sum = X+ Y #(N x N x D) broadcast automatically
dense = K.dot(LeakyReLU(alpha=0.1)(feature_sum), A) # (N x N x 1)
問題是 feature_sum 是 GPU 內存昂貴的,其中 N,D>1000。 那么還有其他有效的實現嗎?
點積是關於和的交換運算。 所以:
dot(LRelu(X + Y), A) = dot(LRelu(X), A) + dot(LRelu(Y), A)
所以,你可以這樣做:
dense_x = K.dot(LRelu(X), A)
dense_y = K.dot(LRelu(Y), A)
dense_x = tf.expand_dims(dense_x, axis=1)
dense_y = tf.expand_dims(dense_y, axis=0)
dense = dense_x + dense_y
通過這種方式,所有操作最多在N x D
元素上完成,您只需存儲最多N x N
元素(假設N > D
)。
def timeit(func):
def run(*args, **kwargs):
start = time.time()
out = func(*args, **kwargs)
end = time.time()
print(f"Exec: {(end-start)*1000:.4f}ms")
return out
return run
@timeit
def fast(X, Y, A, N, D):
X = X.reshape(N, 1, D)
Y = Y.reshape(1, N, D)
feature_sum = X + Y
dense = feature_sum @ A
return dense
@timeit
def fast(X, Y, A, N, D):
dense_x = X @ A
dense_y = Y @ A
dense_x = dense_x.reshape(N, 1, 1)
dense_y = dense_y.reshape(1, N, 1)
dense = dense_x + dense_y
return dense
def main():
N = 1000
D = 500
X = np.random.rand(N, D)
Y = np.random.rand(N, D)
A = np.random.rand(D, 1)
dense1 = slow(X, Y, A, N, D)
dense2 = fast(X, Y, A, N, D)
print("Same result: ", np.allclose(dense1, dense2))
輸出:
Exec: 1547.9290ms # slow
Exec: 2.9860ms # fast
Same result: True
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.