[英]Flops in tensorflow : Matrix multiplication
Inspired by this question I tried to measure the FLOPS required by tensorflow for a matrix-matrix multiplication. 受这个问题的启发,我尝试测量张量流对于矩阵矩阵乘法所需的FLOPS。
For two matrices A and B with sizes (mxp) and (pxn), respectively, the resulting matrix C=AB with size (mxn) has mn entries. 对于大小分别为(mxp)和(pxn)的两个矩阵A和B,结果矩阵C = AB(大小为(mxn))具有mn个条目。 For each entry, p multiplications and (p-1) summations are required.
对于每个条目,都需要p个乘法和(p-1)个求和。 Hence, the total number of operations is
mn(2p-1)
. 因此,操作总数为
mn(2p-1)
。
With the code from the linked question/answer, tensorflow outputs m*n*2p
, see code below. 使用链接的问题/答案中的代码,tensorflow输出
m*n*2p
,请参见下面的代码。
Why is this approximation returned and not the theoretical value? 为什么返回这种近似值而不是理论值? In the worst case, p=1, this approximation is factor 2 larger than the correct value.
在最坏的情况下,p = 1,此近似值比正确值大2倍。
import numpy as np
import tensorflow as tf
g = tf.Graph()
run_meta = tf.RunMetadata()
with g.as_default():
A=tf.convert_to_tensor(np.random.rand(13,9))
B=tf.convert_to_tensor(np.random.rand(9,7))
C = tf.matmul(A,B) # shape=[13,7]
opts = tf.profiler.ProfileOptionBuilder.float_operation()
flops = tf.profiler.profile(g, run_meta=run_meta, cmd='op', options
=opts)
if flops is not None:
print('Flops should be ', 13*7*(2*9-1))
print('Approximation 2*13*7*9=',2*13*7*9)
print('TF stats gives',flops.total_float_ops)
#Output:
#Flops should be 1547
#Approximation 2*13*7*9= 1638
#TF stats gives 1638
I think this is because in practice, summations are often coded like this (pseudo-code below): 我认为这是因为在实践中,求和通常是这样编码的(下面的伪代码):
total = 0
for i in 0...p
total += x[i] * y[i]
that is, the first element x[0] * y[0]
is summed to total
(which is 0 then), which yields p
summations rather than p-1
. 就是说,第一个元素
x[0] * y[0]
求和( total
为0),得出p
求和而不是p-1
。
You could try to be smart and avoid this extra summation: 您可以尝试变得聪明,避免这种额外的总结:
total = x[0] * y[0]
for i in 1...p
total += x[i] * y[i]
... but then what happens if p==0
? ...但是如果
p==0
会发生什么呢? Ouch we need to add an extra comparison: 哎呀,我们需要添加一个额外的比较:
if p > 0
total = x[0] * y[0]
for i in 1...p
total += x[i] * y[i]
else
total = 0
The thing is, this comparison is not a flop and will not appear in your flop count -- yet in practice it is as costly, if not more costly, than a simple add. 事实是,这种比较不是失败,而且不会出现在您的失败计数中-但实际上,它比简单的添加成本更高,甚至更高。
Bottom line: 底线:
I'm not sure why but I think this is the "coded" theoretical value: 我不确定为什么,但是我认为这是“编码”理论值:
...
@ops.RegisterStatistics("MatMul", "flops")
def _calc_mat_mul_flops(graph, node):
"""Calculates the compute resources needed for MatMul."""
transpose_a = node.attr["transpose_a"].b
a_shape = graph_util.tensor_shape_from_node_def_name(graph, node.input[0])
a_shape.assert_is_fully_defined()
if transpose_a:
k = int(a_shape[0])
else:
k = int(a_shape[1])
output_shape = graph_util.tensor_shape_from_node_def_name(graph, node.name)
output_shape.assert_is_fully_defined()
output_count = np.prod(output_shape.as_list())
return ops.OpStats("flops", (k * output_count * 2))
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.