简体   繁体   English

TensorFlow:有没有办法测量模型的 FLOPS?

[英]TensorFlow: Is there a way to measure FLOPS for a model?

The closest example I can get is found in this issue: https://github.com/tensorflow/tensorflow/issues/899我能得到的最接近的例子是在这个问题中找到的: https ://github.com/tensorflow/tensorflow/issues/899

With this minimum reproducible code:使用这个最小的可重现代码:

import tensorflow as tf
import tensorflow.python.framework.ops as ops 
g = tf.Graph()
with g.as_default():
  A = tf.Variable(tf.random_normal( [25,16] ))
  B = tf.Variable(tf.random_normal( [16,9] ))
  C = tf.matmul(A,B) # shape=[25,9]
for op in g.get_operations():
  flops = ops.get_stats_for_node_def(g, op.node_def, 'flops').value
  if flops is not None:
    print 'Flops should be ~',2*25*16*9
    print '25 x 25 x 9 would be',2*25*25*9 # ignores internal dim, repeats first
    print 'TF stats gives',flops

However, the FLOPS returned is always None.但是,返回的 FLOPS 始终为 None。 Is there a way to concretely measure FLOPS, especially with a PB file?有没有办法具体测量 FLOPS,尤其是 PB 文件?

I would like to build on Tobias Schnek's answer as well as answering the original question: how to get FLOP from a pb file.我想以 Tobias Schnek 的回答为基础,并回答最初的问题:如何从pb文件中获取 FLOP。

Running the first snippet of code from Tobias answer with TensorFlow 1.6.0使用 TensorFlow 1.6.0 运行 Tobias answer 的第一段代码

g = tf.Graph()
run_meta = tf.RunMetadata()
with g.as_default():
    A = tf.Variable(tf.random_normal([25,16]))
    B = tf.Variable(tf.random_normal([16,9]))
    C = tf.matmul(A,B)

    opts = tf.profiler.ProfileOptionBuilder.float_operation()    
    flops = tf.profiler.profile(g, run_meta=run_meta, cmd='op', options=opts)
    if flops is not None:
        print('Flops should be ~',2*25*16*9)
        print('TF stats gives',flops.total_float_ops)

We get the following ouput:我们得到以下输出:

Flops should be ~ 7200
TF stats gives 8288

So, why do we get 8288 instead of the expected result 7200=2*25*16*9 [a] ?那么,为什么我们得到的是8288而不是预期的结果7200=2*25*16*9 [a]呢? The answer is in the way the tensors A and B are initialised.答案在于张量AB的初始化方式。 Initialising with a Gaussian distribution costs some FLOP.使用高斯分布初始化会花费一些 FLOP。 Changing the definition of A and B by通过更改AB的定义

    A = tf.Variable(initial_value=tf.zeros([25, 16]))
    B = tf.Variable(initial_value=tf.zeros([16, 9]))

gives the expected output 7200 .给出预期的输出7200

Usually, a network's variables are initialised with Gaussian distributions among other schemes.通常,网络的变量在其他方案中使用高斯分布进行初始化。 Most of the time, we are not interested by the initialisation FLOP as they are done once during initialisation and do not happen during the training nor the inference.大多数时候,我们对初始化 FLOP 不感兴趣,因为它们在初始化期间完成一次,并且不会在训练或推理期间发生。 So, how could one get the exact number of FLOP disregarding the initialisation FLOP ?那么,如何在不考虑初始化 FLOP 的情况下获得 FLOP 的确切数量

Freeze the graph with a pb .pb冻结图形 Calculating the FLOP from a pb file was, actually, the OP's use case.pb文件计算 FLOP 实际上是 OP 的用例。

The following snippet illustrates this:以下片段说明了这一点:

import tensorflow as tf
from tensorflow.python.framework import graph_util

def load_pb(pb):
    with tf.gfile.GFile(pb, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph

# ***** (1) Create Graph *****
g = tf.Graph()
sess = tf.Session(graph=g)
with g.as_default():
    A = tf.Variable(initial_value=tf.random_normal([25, 16]))
    B = tf.Variable(initial_value=tf.random_normal([16, 9]))
    C = tf.matmul(A, B, name='output')
    sess.run(tf.global_variables_initializer())
    flops = tf.profiler.profile(g, options = tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP before freezing', flops.total_float_ops)
# *****************************        

# ***** (2) freeze graph *****
output_graph_def = graph_util.convert_variables_to_constants(sess, g.as_graph_def(), ['output'])

with tf.gfile.GFile('graph.pb', "wb") as f:
    f.write(output_graph_def.SerializeToString())
# *****************************


# ***** (3) Load frozen graph *****
g2 = load_pb('./graph.pb')
with g2.as_default():
    flops = tf.profiler.profile(g2, options = tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP after freezing', flops.total_float_ops)

outputs产出

FLOP before freezing 8288
FLOP after freezing 7200

[a] Usually the FLOP of a matrix multiplication are mq(2p -1) for the product AB where A[m, p] and B[p, q] but TensorFlow returns 2mpq for some reason. [a]通常矩阵乘法的 FLOP 是乘积 AB 的 mq(2p -1),其中A[m, p]B[p, q]但 TensorFlow 出于某种原因返回 2mpq。 An issue has been opened to understand why.已打开一个问题以了解原因。

A little bit late but maybe it helps some visitors in future.有点晚了,但也许它可以帮助将来的一些游客。 For your example I successfully tested the following snippet:对于您的示例,我成功测试了以下代码段:

g = tf.Graph()
run_meta = tf.RunMetadata()
with g.as_default():
    A = tf.Variable(tf.random_normal( [25,16] ))
    B = tf.Variable(tf.random_normal( [16,9] ))
    C = tf.matmul(A,B) # shape=[25,9]

    opts = tf.profiler.ProfileOptionBuilder.float_operation()    
    flops = tf.profiler.profile(g, run_meta=run_meta, cmd='op', options=opts)
    if flops is not None:
        print('Flops should be ~',2*25*16*9)
        print('25 x 25 x 9 would be',2*25*25*9) # ignores internal dim, repeats first
        print('TF stats gives',flops.total_float_ops)

It's also possible to use the profiler in combination with Keras like the following snippet:也可以将探查器与Keras结合使用,如以下代码片段:

import tensorflow as tf
import keras.backend as K
from keras.applications.mobilenet import MobileNet

run_meta = tf.RunMetadata()
with tf.Session(graph=tf.Graph()) as sess:
    K.set_session(sess)
    net = MobileNet(alpha=.75, input_tensor=tf.placeholder('float32', shape=(1,32,32,3)))

    opts = tf.profiler.ProfileOptionBuilder.float_operation()    
    flops = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)

    opts = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()    
    params = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)

    print("{:,} --- {:,}".format(flops.total_float_ops, params.total_parameters))

I hope I could help!我希望我能帮上忙!

The above approaches no longer work for TF2.0 as the profiler methods have been deprecated and moved under compat.v1 .上述方法不再适用于 TF2.0,因为探查器方法已被弃用并移至compat.v1下。 Seems like this feature still needs to be implemented.看起来这个功能仍然需要实现。

Below is an issue on Github: https://github.com/tensorflow/tensorflow/issues/32809以下是 Github 上的一个问题: https ://github.com/tensorflow/tensorflow/issues/32809

Another user posted an answer.另一个用户发布了一个答案。 It was deleted by a mod so it cannot be restored.它已被 mod 删除,因此无法恢复。 But it does solve the problem, and better than other answers.但它确实解决了问题,并且比其他答案更好。 So I repeat it here.所以我在这里重复一遍。


You can use following pip package to get some basic information like model's memory requirement, no.您可以使用以下 pip 包来获取一些基本信息,例如模型的内存要求,没有。 of parameters, flops etc.参数,触发器等

https://pypi.org/project/model-profiler https://pypi.org/project/model-profiler

it'll output something like它会输出类似

Model Profile型号简介 Value价值 Unit单元
Selected GPUs选定的 GPU ['0', '1'] ['0', '1'] GPU IDs GPU ID
No. of FLOPs失败次数 0.30932349055999997 0.30932349055999997 BFLOPs BFLOPs
GPU Memory Requirement GPU 内存要求 7.4066760912537575 7.4066760912537575 GB国标
Model Parameters模型参数 138.357544 138.357544 Million百万
Memory Required by Model Weights模型权重所需的内存 527.7921447753906 527.7921447753906 MB MB

Usage用法

[Copied verbatim from the library website] [从图书馆网站逐字复制]

from tensorflow.keras.applications import VGG16

model = VGG16(include_top=True)

from model_profiler import model_profiler

Batch_size = 128
profile = model_profiler(model, Batch_size)

print(profile)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM