[英]How to customize tensor operation in Tensorflow similar to tf.matmul?
我正在处理自我注意,并遇到了论文https://arxiv.org/pdf/2005.00928.pdf “Quantifying Attention Flow in Transformers”
我试图按照论文中的建议计算注意力流。 https://samiraabnar.github.io/articles/2020-04/attention_flow
其中一位作者有一个 Github: https://github.com/samiraabnar/attention_flow使用 networkx 来计算注意力流。 但是,在处理长序列时会非常慢。
长话短说,我想利用 tensorflow 和 GPU 加速来加快计算速度。 但是,没有直接的 tf 操作可以这样做。
特别是,当尝试计算最大流量时,它需要计算类似的矩阵乘法。
我想要一个类似于y = tf.matmul(x1, x2)
的张量运算,其中tf.matmul
将返回张量y
,其中y[...,i,j] = sum(x1[...,i,:] * x2[...,:,j])
但是,我想定义一个新操作,而不是点积,这样它将用最大值替换乘法,即y = some_tf_op(x1, x2)
,其中y[...,i,j] = sum(tf.maximum(x1[...,i,:], x2[...,:,j]))
I understand that it is doable outside of graph computation, however I wish to place it inside a graph computation (eg the call function inside a tf.keras.layers.Layer or a tf.keras.Model) without expending too much resources.
IE
import tensorflow as tf
# A is a tensor with shape (...,m,n),
A = tf.constant([[1,2,3],[4,5,6]])
display('A',A)
# B is a tensor with shape (...,n,m)
B = tf.constant([[6,3],[5,2],[4,1]])
display('B',B)
@tf.function
def some_tf_op(x1,x2):
...
return output
C = some_tf_op(A, B)
display('C',C)
预期 output:
A
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)>
B
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[6, 3],
[5, 2],
[4, 1]], dtype=int32)>
C
<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[15, 8],
[17, 15]], dtype=int32)>
我已经制定了一个解决方案,但它似乎非常昂贵,尤其是在处理大张量时。
import tensorflow as tf
# A is a tensor with shape (...,m,n),
A = A = tf.reshape(tf.range(3*4*5), (3,4,5))
display('A',A)
# B is a tensor with shape (...,n,p)
B = tf.transpose(tf.reshape(tf.range(3*6*5,0,-1), (3,6,5)), (0,2,1))
display('B',B)
@tf.function
def some_tf_op(x1,x2):
#...
A = x1
B = x2
B = tf.einsum('...ij->...ji',B)
A = tf.expand_dims(A, axis = -2)
A = tf.repeat(A, axis = -2, repeats = tf.shape(B)[-2])
B = tf.expand_dims(B, axis = -3)
B = tf.repeat(B, axis = -3, repeats = tf.shape(A)[-3])
C = tf.reduce_sum(tf.math.maximum(A,B), axis = -1)
return C
C = some_tf_op(A, B)
# C is a tensor with shape(m, p)
display('C',C)
output
<tf.Tensor: shape=(3, 4, 5), dtype=int32, numpy=
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]], dtype=int32)>
B
<tf.Tensor: shape=(3, 5, 6), dtype=int32, numpy=
array([[[90, 85, 80, 75, 70, 65],
[89, 84, 79, 74, 69, 64],
[88, 83, 78, 73, 68, 63],
[87, 82, 77, 72, 67, 62],
[86, 81, 76, 71, 66, 61]],
[[60, 55, 50, 45, 40, 35],
[59, 54, 49, 44, 39, 34],
[58, 53, 48, 43, 38, 33],
[57, 52, 47, 42, 37, 32],
[56, 51, 46, 41, 36, 31]],
[[30, 25, 20, 15, 10, 5],
[29, 24, 19, 14, 9, 4],
[28, 23, 18, 13, 8, 3],
[27, 22, 17, 12, 7, 2],
[26, 21, 16, 11, 6, 1]]], dtype=int32)>
C :
<tf.Tensor: shape=(3, 4, 6), dtype=int32, numpy=
array([[[440, 415, 390, 365, 340, 315],
[440, 415, 390, 365, 340, 315],
[440, 415, 390, 365, 340, 315],
[440, 415, 390, 365, 340, 315]],
[[290, 265, 240, 215, 190, 165],
[290, 265, 240, 215, 190, 165],
[290, 265, 240, 215, 190, 169],
[290, 265, 240, 215, 194, 185]],
[[210, 210, 210, 210, 210, 210],
[235, 235, 235, 235, 235, 235],
[260, 260, 260, 260, 260, 260],
[285, 285, 285, 285, 285, 285]]], dtype=int32)>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.