繁体   English   中英

如何在 Tensorflow 中自定义张量运算,类似于 tf.matmul?

[英]How to customize tensor operation in Tensorflow similar to tf.matmul?

我正在处理自我注意,并遇到了论文https://arxiv.org/pdf/2005.00928.pdf “Quantifying Attention Flow in Transformers”

我试图按照论文中的建议计算注意力流。 https://samiraabnar.github.io/articles/2020-04/attention_flow

其中一位作者有一个 Github: https://github.com/samiraabnar/attention_flow使用 networkx 来计算注意力流。 但是,在处理长序列时会非常慢。

长话短说,我想利用 tensorflow 和 GPU 加速来加快计算速度。 但是,没有直接的 tf 操作可以这样做。

特别是,当尝试计算最大流量时,它需要计算类似的矩阵乘法。

我想要一个类似于y = tf.matmul(x1, x2)的张量运算,其中tf.matmul将返回张量y ,其中y[...,i,j] = sum(x1[...,i,:] * x2[...,:,j])

但是,我想定义一个新操作,而不是点积,这样它将用最大值替换乘法,即y = some_tf_op(x1, x2) ,其中y[...,i,j] = sum(tf.maximum(x1[...,i,:], x2[...,:,j]))

I understand that it is doable outside of graph computation, however I wish to place it inside a graph computation (eg the call function inside a tf.keras.layers.Layer or a tf.keras.Model) without expending too much resources.

IE

import tensorflow as tf
# A is a tensor with shape (...,m,n), 
A = tf.constant([[1,2,3],[4,5,6]])

display('A',A)


# B is a tensor with shape (...,n,m)
B = tf.constant([[6,3],[5,2],[4,1]])
display('B',B)

@tf.function
def some_tf_op(x1,x2):
    ...
    return output
C = some_tf_op(A, B)
display('C',C)

预期 output:

A
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 2, 3],
       [4, 5, 6]], dtype=int32)>
B
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[6, 3],
       [5, 2],
       [4, 1]], dtype=int32)>
C
<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[15,  8],
       [17, 15]], dtype=int32)>

我已经制定了一个解决方案,但它似乎非常昂贵,尤其是在处理大张量时。

import tensorflow as tf
# A is a tensor with shape (...,m,n), 
A = A = tf.reshape(tf.range(3*4*5), (3,4,5))
display('A',A)


# B is a tensor with shape (...,n,p)
B = tf.transpose(tf.reshape(tf.range(3*6*5,0,-1), (3,6,5)), (0,2,1))
display('B',B)

@tf.function
def some_tf_op(x1,x2):
    #...
    A = x1
    B = x2 
    B = tf.einsum('...ij->...ji',B)

    A = tf.expand_dims(A, axis = -2)
    A = tf.repeat(A, axis = -2, repeats = tf.shape(B)[-2])

    B = tf.expand_dims(B, axis = -3)
    B = tf.repeat(B, axis = -3, repeats = tf.shape(A)[-3])

    C = tf.reduce_sum(tf.math.maximum(A,B), axis = -1)
    return C
C = some_tf_op(A, B)
# C is a tensor with shape(m, p)
display('C',C)

output

<tf.Tensor: shape=(3, 4, 5), dtype=int32, numpy=
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]], dtype=int32)>
B
<tf.Tensor: shape=(3, 5, 6), dtype=int32, numpy=
array([[[90, 85, 80, 75, 70, 65],
        [89, 84, 79, 74, 69, 64],
        [88, 83, 78, 73, 68, 63],
        [87, 82, 77, 72, 67, 62],
        [86, 81, 76, 71, 66, 61]],

       [[60, 55, 50, 45, 40, 35],
        [59, 54, 49, 44, 39, 34],
        [58, 53, 48, 43, 38, 33],
        [57, 52, 47, 42, 37, 32],
        [56, 51, 46, 41, 36, 31]],

       [[30, 25, 20, 15, 10,  5],
        [29, 24, 19, 14,  9,  4],
        [28, 23, 18, 13,  8,  3],
        [27, 22, 17, 12,  7,  2],
        [26, 21, 16, 11,  6,  1]]], dtype=int32)>
C :
<tf.Tensor: shape=(3, 4, 6), dtype=int32, numpy=
array([[[440, 415, 390, 365, 340, 315],
        [440, 415, 390, 365, 340, 315],
        [440, 415, 390, 365, 340, 315],
        [440, 415, 390, 365, 340, 315]],

       [[290, 265, 240, 215, 190, 165],
        [290, 265, 240, 215, 190, 165],
        [290, 265, 240, 215, 190, 169],
        [290, 265, 240, 215, 194, 185]],

       [[210, 210, 210, 210, 210, 210],
        [235, 235, 235, 235, 235, 235],
        [260, 260, 260, 260, 260, 260],
        [285, 285, 285, 285, 285, 285]]], dtype=int32)>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM