Tensorflow服務重新訓練開始

Question

我正在嘗試按照本指南提供我的再培訓初始模型（您可能還會看到本指南，它解釋了如何重新訓練開始）。 我已經修改了retrain.py來導出我的模型，如下所示：

... # Same as in the original script:
# Set up the pre-trained graph.
maybe_download_and_extract()
graph, bottleneck_tensor, jpeg_data_tensor, resized_image_tensor = (create_inception_graph())
... # Same as in the original script:
# Add the new layer that we'll be training.
(train_step, cross_entropy, bottleneck_input, ground_truth_input, final_tensor) = add_final_training_ops(len(image_lists.keys()),
                                         FLAGS.final_tensor_name,
                                         bottleneck_tensor)
... # Added at the end of the original script:
# Export model
with graph.as_default():
    export_path = sys.argv[-1]
    print('Exporting trained model to', export_path)
    saver = tf.train.Saver(sharded=True)
    model_exporter = exporter.Exporter(saver)
    signature = exporter.classification_signature(input_tensor=jpeg_data_tensor, scores_tensor=final_tensor)
    model_exporter.init(sess.graph.as_graph_def(), default_graph_signature=signature)
    model_exporter.export(export_path, tf.constant(FLAGS.export_version), sess)
    print('Done exporting!')

if __name__ == '__main__':
  tf.app.run()

導出我的模型后，我開始運行服務器：

/serving/bazel-bin/tensorflow_serving/example/inception_inference --port=9000 EXPORT_DIR &> inception_log &

服務器日志文件（inception_log）包含：

I tensorflow_serving/core/basic_manager.cc:190] Using InlineExecutor for BasicManager.
I tensorflow_serving/example/inception_inference.cc:384] Waiting for models to be loaded...
I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:147] File-system polling found servable version {name: default version: 1} at path /tf_files/scope/export/00000001
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:129] Attempting to load a SessionBundle from: /tf_files/scope/export/00000001
I tensorflow_serving/example/inception_inference.cc:384] Waiting for models to be loaded...
I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:147] File-system polling found servable version {name: default version: 1} at path /tf_files/scope/export/00000001
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:106] Running restore op for SessionBundle
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:203] Done loading SessionBundle
I tensorflow_serving/example/inception_inference.cc:350] Running...
I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:147] File-system polling found servable version {name: default version: 1} at path /tf_files/scope/export/00000001
I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:147] File-system polling found servable version {name: default version: 1} at path /tf_files/scope/export/00000001
I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:147] File-system polling found servable version {name: default version: 1} at path /tf_files/scope/export/00000001
...

最后，我運行客戶端，我收到以下錯誤：

/serving/bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=TEST_IMG
D0805 09:10:46.208704633     200 ev_posix.c:101]             Using polling engine: poll
Traceback (most recent call last):
  File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tensorflow_serving/example/inception_client.py", line 53, in <module>
    tf.app.run()
  File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/external/org_tensorflow/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tensorflow_serving/example/inception_client.py", line 48, in main
    result = stub.Classify(request, 10.0)  # 10 secs timeout
  File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 300, in __call__
    self._request_serializer, self._response_deserializer)
  File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 198, in _blocking_unary_unary
    raise _abortion_error(rpc_error_call)
    grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INTERNAL, details="FetchOutputs node : not found")
E0805 09:10:47.129263239     200 chttp2_transport.c:1810]    close_transport: {"created":"@1470388247.129230608","description":"FD shutdown","file":"src/core/lib/iomgr/ev_poll_posix.c","file_line":427}

任何有關此事的建議或指導都將不勝感激。

Answer 1

因此，tensorflow網站上的鏈接只是從我的經驗中全面服務模型的一種方式。 服務該模型的更好方法是從Flask和Kubernetes提供服務，因為它比所有tensorflow服務基礎設施更輕量級客戶端，但這假設您擁有的體積不是很大（> 100） QPS）雖然您可以使用Flask和Kubernetes提供Inception，但是按照這個速度，我會選擇內聯解決方案。

您可以通過遠程服務提供服務，這可以工作，但根據您的基礎架構，您“可以”在流式作業中提供該模型，該工作通過apache_beam.DoFn推送您的請求，然后將其輸出回MQ你的工作正在傾聽。 這只是另一種解決方案。 希望這可以幫助。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import logging 
import tensorflow as tf
import numpy as np
import apache_beam as beam


class InferenceFn(beam.DoFn):

  def __init__(self, model_dict):
    super(InferenceFn, self).__init__()
    self.model_dict = model_dict
    self.graph = None
    self.create_graph()


  def create_graph(self):
    if not tf.gfile.FastGFile(self.model_dict['model_full_path']):
      self.download_model_file()
    with tf.Graph().as_default() as graph:
      with tf.gfile.FastGFile(self.model_dict['model_full_path'], 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        _ = tf.import_graph_def(graph_def, name='')
    self.graph = graph

  def start_bundle(self):
    """Prevents graph object serialization until serving. Required for GCP Serving"""
    self.create_graph()

  def process(self, element):
    """Core Processing Fn for Apache Beam."""
    try:
      with tf.Session(graph=self.graph) as sess:
        if not tf.gfile.Exists(element):
          tf.logging.fatal('File does not exist %s', element)
          raise ReferenceError("Couldnt Find the image {}".format(element))
        data = tf.gfile.FastGFile(element, 'rb').read()
        output_tensor = sess.graph.get_tensor_by_name(self.model_dict['output_tensor_name'])
        predictions = sess.run(softmax_tensor, {self.model_dict['input_tensor_name']: data})
        predictions = np.squeeze(predictions)
        yield str(predictions)
    except Exception:
      logging.error("We hit an error in inference on {}".format(element))

Tensorflow服務重新訓練開始

問題描述

1 個解決方案

解決方案1
1 2017-07-20 06:06:23

Tensorflow服務重新訓練開始

問題描述

1 個解決方案

解決方案1 1 2017-07-20 06:06:23

解決方案1
1 2017-07-20 06:06:23