简体   繁体   English

TensorFlow:冻结图形后显着的精度损失?

[英]TensorFlow: Dramatic loss of accuracy after freezing graph?

Is it common to see a dramatic loss of accuracy following the freezing of a graph for serving? 在冻结服务图表后,是否常常看到严重的准确性损失? During training and evaluation of the flowers dataset using a pretrained inception-resnet-v2, my accuracy is 98-99%, with a probability of 90+% for the correct predictions. 在使用预训练的初始-resnet-v2训练和评估花数据集期间,我的准确度为98-99%,正确预测的概率为90 +%。 However, after freezing my graph and predicting it again, my model was not as accurate and the right labels are only predicted with a confidence of 30-40%. 然而,在冻结我的图表并再次预测之后,我的模型不那么准确,只有30-40%的置信度才能预测正确的标签。

After model training, I had several items: 模型训练后,我有几个项目:

  1. Checkpoint file 检查点文件
  2. model.ckpt.index file model.ckpt.index文件
  3. model.ckpt.meta file model.ckpt.meta文件
  4. model.ckpt file model.ckpt文件
  5. a graph.pbtxt file. graph.pbtxt文件。

As I was unable to run the official freeze graph file located in the tensorflow repository on GitHub (I guess it was because I have a pbtxt file and not pb file after my training), I am reusing the code from this tutorial instead. 由于我无法在GitHub上运行位于tensorflow存储库中的官方冻结图文件(我想这是因为我在训练后有一个pbtxt文件而不是pb文件),我正在重用本教程中的代码。

Here is the code I modified to freeze my graph: 这是我修改为冻结图表的代码:

import os, argparse

import tensorflow as tf
from tensorflow.python.framework import graph_util

dir = os.path.dirname(os.path.realpath(__file__))

def freeze_graph(model_folder, input_checkpoint):
    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_folder)
    # input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph
    absolute_model_folder = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_folder + "/frozen_model.pb"

    # Before exporting our graph, we need to precise what is our output node
    # This is how TF decides what part of the Graph he has to keep and what part it can dump
    # NOTE: this variable is plural, because you can have multiple output nodes
    output_node_names = "InceptionResnetV2/Logits/Predictions"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We import the meta graph and retrieve a Saver
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

    # We retrieve the protobuf graph definition
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()

    # We start a session and restore the graph weights
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants
        output_graph_def = graph_util.convert_variables_to_constants(
            sess, # The session is used to retrieve the weights
            input_graph_def, # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_folder", type=str, help="Model folder to export")
    parser.add_argument("--input_checkpoint", type = str, help = "Input checkpoint name")
    args = parser.parse_args()

    freeze_graph(args.model_folder, args.input_checkpoint)

This is the code I use to run my prediction, where I feed in only one image as intended by the user: 这是我用来运行我的预测的代码,我只根据用户的意图输入一个图像:

import tensorflow as tf
from scipy.misc import imread, imresize
import numpy as np

img = imread("./dandelion.jpg")
img = imresize(img, (299,299,3))
img = img.astype(np.float32)
img = np.expand_dims(img, 0)

labels_dict = {0:'daisy', 1:'dandelion',2:'roses', 3:'sunflowers', 4:'tulips'}

#Define the filename of the frozen graph
graph_filename = "./frozen_model.pb"

#Create a graph def object to read the graph
with tf.gfile.GFile(graph_filename, "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

#Construct the graph and import the graph from graphdef
with tf.Graph().as_default() as graph:
    tf.import_graph_def(graph_def)

    #We define the input and output node we will feed in
    input_node = graph.get_tensor_by_name('import/batch:0')
    output_node = graph.get_tensor_by_name('import/InceptionResnetV2/Logits/Predictions:0')

    with tf.Session() as sess:
        predictions = sess.run(output_node, feed_dict = {input_node: img})
        print predictions
        label_predicted = np.argmax(predictions[0])

    print 'Predicted Flower:', labels_dict[label_predicted]
    print 'Prediction probability:', predictions[0][label_predicted]

And the output I received from running my prediction: 我通过运行我的预测收到的输出:

2017-04-11 17:38:21.722217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-04-11 17:38:21.722608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.42GiB
2017-04-11 17:38:21.722624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-11 17:38:21.722630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-11 17:38:21.722642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)
2017-04-11 17:38:22.183204: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-04-11 17:38:22.183232: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-04-11 17:38:22.184007: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0xb85a1c0 executing computations on platform Host. Devices:
2017-04-11 17:38:22.184022: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): <undefined>, <undefined>
2017-04-11 17:38:22.184140: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-04-11 17:38:22.184149: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-04-11 17:38:22.184610: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0xb631ee0 executing computations on platform CUDA. Devices:
2017-04-11 17:38:22.184620: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): GeForce GTX 860M, Compute Capability 5.0
[[ 0.1670652   0.46482906  0.12899996  0.12481128  0.11429448]]
Predicted Flower: dandelion
Prediction probability: 0.464829

Potential source of problem: I first trained my model using TF 0.12, but I believe it is compatible with Tf 1.01, the version I'm using now. 潜在的问题来源:我首先使用TF 0.12训练我的模型,但我相信它与我现在使用的版本Tf 1.01兼容。 As a safety precaution, I upgraded my files to TF 1.01 and retrained the model to obtain new sets of checkpoint files (with the same accuracy), and then used these checkpoint files for freezing. 作为安全预防措施,我将文件升级到TF 1.01并重新训练模型以获取新的检查点文件集(具有相同的准确性),然后使用这些检查点文件进行冻结。 I compiled my tensorflow from source. 我从源代码编译了我的张量流。 Is the issue coming from the fact that I use a pbtxt file instead of a pb file? 问题来自于我使用pbtxt文件而不是pb文件的事实吗? I have no idea how I could get a pb file from training my model. 我不知道如何通过训练我的模型得到一个pb文件。

I believe the problem is not related to freezing the model. 我认为问题与冻结模型无关。 Instead, it is related to the way you pre-process your image. 相反,它与您预处理图像的方式有关。

I recommend you to use the default pre-processing function in InceptionResnet V2. 我建议您使用InceptionResnet V2中的默认预处理功能。

Below, I will post a code that takes an image path (JPG or PNG) and returns a preprocessed images. 下面,我将发布一个采用图像路径(JPG或PNG)并返回预处理图像的代码。 You can modify it to make it receive a batch of images. 您可以修改它以使其接收一批图像。 It is not a professional code. 它不是专业代码。 It needs some optimization. 它需要一些优化。 However, it is working well. 但是,它运作良好。

First, Loading the image: 一,加载图片:

def load_img(path_img):
    """
    Load an image to tensorflow
    :param path_img: image path on the disk
    :return: 3D tensorflow image
    """
    filename_queue = tf.train.string_input_producer([path_img])  # list of files to read

    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)

    my_img = tf.image.decode_image(value)  # use png or jpg decoder based on your files.

    init_op = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init_op)

        # Start populating the filename queue.

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        for i in range(1):  # length of your filename list
            image = my_img.eval()  # here is your image Tensor :)

        print(image.shape)
        # Image.fromarray(np.asarray(image)).show()

        coord.request_stop()
        coord.join(threads)

        return image

Then, the pre-processing code: 然后,预处理代码:

def preprocess(image, height, width,
               central_fraction=0.875, scope=None):
    """Prepare one image for evaluation.

    If height and width are specified it would output an image with that size by
    applying resize_bilinear.

    If central_fraction is specified it would cropt the central fraction of the
    input image.

    Args:
      image: 3-D Tensor of image. If dtype is tf.float32 then the range should be
        [0, 1], otherwise it would converted to tf.float32 assuming that the range
        is [0, MAX], where MAX is largest positive representable number for
        int(8/16/32) data type (see `tf.image.convert_image_dtype` for details)
      height: integer
      width: integer
      central_fraction: Optional Float, fraction of the image to crop.
      scope: Optional scope for name_scope.
    Returns:
      3-D float Tensor of prepared image.
    """

    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    # Crop the central region of the image with an area containing 87.5% of
    # the original image.
    if central_fraction:
        image = tf.image.central_crop(image, central_fraction=central_fraction)

    if height and width:
        # Resize the image to the specified height and width.
        image = tf.expand_dims(image, 0)
        image = tf.image.resize_bilinear(image, [height, width],
                                         align_corners=False)
        image = tf.squeeze(image, [0])
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image

Finally, for my case, I had to convert the processed tensor into a numpy array: 最后,对于我的情况,我不得不将处理后的张量转换为numpy数组:

image = tf.Session().run(image)

So, this image can be fed to the freezed model 因此,该图像可以馈送到冻结模型

persistent_sess = tf.Session(graph=graph)  # , config=sess_config)

    input_node = graph.get_tensor_by_name('prefix/batch:0')
    output_node = graph.get_tensor_by_name('prefix/InceptionResnetV2/Logits/Predictions:0')

    predictions = persistent_sess.run(output_node, feed_dict={input_node: [image]})
    print(predictions)
    label_predicted = np.argmax(predictions[0])
    print(label_predicted)

I had a similar issue and the accuracy was 1.5% lower when using a frozen model. 我有类似的问题,使用冷冻模型时准确度降低了1.5%。 The problem was about the saver object in the code that freeze the model. 问题在于冻结模型的代码中的保护程序对象。 You need to pass Moving average decay to the saver as an argument. 您需要将移动平均值衰减作为参数传递给保护程序。 I use the code from Inception model and this is how I create the saver in the freezing script: 我使用Inception模型中的代码,这就是我在冻结脚本中创建保护程序的方法:

variable_averages = tf.train.ExponentialMovingAverage(0.9997)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)

For me it solved the problem. 对我来说,它解决了这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM