简体   繁体   English

TensorFlow自定义估算器预测抛出值错误

[英]TensorFlow Custom Estimator predict throwing value error

Note: this question has an accompanying, documented Colab notebook. 注意:这个问题有一个附带的,已记录在案的Colab笔记本。

TensorFlow's documentation can, at times, leave a lot to be desired. TensorFlow的文档有时可能有很多不足之处。 Some of the older docs for lower level apis seem to have been expunged, and most newer documents point towards using higher level apis such as TensorFlow's subset of keras or estimators . 一些用于较低级别api的较旧文档似乎已被删除,并且大多数较新的文档都指向使用较高级别的api,例如TensorFlow的keras子集或estimators This would not be so problematic if the higher level apis did not so often rely closely on their lower levels. 如果较高级别的api不太经常紧密依赖于较低级别的api,那么这不会有问题。 Case in point, estimators (especially the input_fn when using TensorFlow Records). input_fn来说, estimators (尤其是使用TensorFlow Records时的input_fn )。

Over the following Stack Overflow posts: 在以下Stack Overflow帖子中:

and with the gracious assistance of the TensorFlow / StackOverflow community, we have moved closer to doing what the TensorFlow "Creating Custom Estimators" guide has not, demonstrating how to make an estimator one might actually use in practice (rather than toy example) eg one which: 在TensorFlow / StackOverflow社区的慷慨帮助下,我们已接近TensorFlow “创建自定义估算器”指南所没有的内容,展示了如何制作一个实际可能会使用的估算器(而非玩具示例),例如一个哪一个:

  • has a validation set for early stopping if performance worsen, 有一个验证集,可以在性能下降时尽早停止运行,
  • reads from TF Records because many datasets are larger than the TensorFlow recommend 1Gb for in memory, and 从TF记录读取数据,因为许多数据集的内存大于TensorFlow建议的1Gb,并且
  • that saves its best version whilst training 在训练时保存其最佳版本

While I still have many questions regarding this (from the best way to encode data into a TF Record, to what exactly the serving_input_fn expects), there is one question that stands out more prominently than the rest: 尽管我对此仍有很多疑问(从将数据编码为TF记录的最佳方法,到serving_input_fn期望什么),但还有一个问题比其他问题更为突出:

How to predict with the custom estimator we just made? 如何使用我们刚刚制作的自定义估算器进行预测?

Under the documentation for predict , it states: predict的文档下,它指出:

input_fn : A function that constructs the features. input_fn :构造input_fn的函数。 Prediction continues until input_fn raises an end-of-input exception ( tf.errors.OutOfRangeError or StopIteration ). 预测将一直持续到input_fn引发输入StopIteration异常( tf.errors.OutOfRangeErrorStopIterationtf.errors.OutOfRangeError See Premade Estimators for more information. 有关更多信息,请参见预制估算器。 The function should construct and return one of the following: 该函数应构造并返回以下之一:

  • A tf.data.Dataset object: Outputs of Dataset object must have same constraints as below. 一个tf.data.Dataset对象:Dataset对象的输出必须具有以下相同的约束。
  • features: A tf.Tensor or a dictionary of string feature name to Tensor. features:tf.Tensor或字符串特征名称到Tensor的字典。 features are consumed by model_fn. 功能由model_fn使用。 They should satisfy the expectation of model_fn from inputs. 它们应该满足输入对model_fn的期望。
  • A tuple, in which case the first item is extracted as features. 一个元组,在这种情况下,第一项被提取为特征。

(perhaps) Most likely, if one is using estimator.predict , they are using data in memory such as a dense tensor (because a held out test set would likely go through evaluate ). (也许)最有可能的是,如果一个人正在使用estimator.predict ,那么他们正在使用内存中的数据,例如密集的张量(因为保持测试集可能会通过evaluate )。

So I, in the accompanying Colab , create a single dense example, wrap it up in a tf.data.Dataset , and call predict to get a ValueError . 因此,在随附的Colab中 ,我创建了一个单一的密集示例,将其包装在tf.data.Dataset ,并调用predict以获取ValueError

I would greatly appreciate it if someone could explain to me how I can: 如果有人可以向我解释我该如何做,我将不胜感激:

  1. load my saved estimator 加载我保存的估算器
  2. given a dense, in memory example, predict the output with the estimator 给定一个密集的内存示例,使用估算器预测输出
to_predict = random_onehot((1, SEQUENCE_LENGTH, SEQUENCE_CHANNELS))\
        .astype(tf_type_string(I_DTYPE))
pred_features = {'input_tensors': to_predict}

pred_ds = tf.data.Dataset.from_tensor_slices(pred_features)
predicted = est.predict(lambda: pred_ds, yield_single_examples=True)

next(predicted)

ValueError: Tensor("IteratorV2:0", shape=(), dtype=resource) must be from the same graph as Tensor("TensorSliceDataset:0", shape=(), dtype=variant). ValueError:Tensor(“ IteratorV2:0”,shape =(),dtype = resource)必须与Tensor(“ TensorSliceDataset:0”,shape =(),dtype = variant)来自同一张图。

When you use the tf.data.Dataset module, it actually defines an input graph which is independant from the model graph. 使用tf.data.Dataset模块时,它实际上定义了一个独立于模型图的输入图。 What happens here is that you first created a small graph by calling tf.data.Dataset.from_tensor_slices() , then the estimator API created a second graph by calling dataset.make_one_shot_iterator() automatically. 这里发生的是,您首先通过调用tf.data.Dataset.from_tensor_slices()创建了一个小图,然后estimator API通过自动调用dataset.make_one_shot_iterator()创建了第二张图。 These 2 graphs can't communicate so it throws an error. 这两个图无法通信,因此会引发错误。

To circumvent this, you should never create a dataset outside of estimator.train/evaluate/predict. 为了避免这种情况,您永远不要在estimator.train / evaluate / predict之外创建数据集。 This is why everything data related is wrapped inside input functions. 这就是为什么所有相​​关数据都包装在输入函数中的原因。

def predict_input_fn(data, batch_size=1):
  dataset = tf.data.Dataset.from_tensor_slices(data)
  return dataset.batch(batch_size).prefetch(None)

predicted = est.predict(lambda: predict_input_fn(pred_features), yield_single_examples=True)
next(predicted)

Now, the graph is not created outside of the predict call. 现在,不会在预测调用之外创建图。

I also added dataset.batch() because the rest of your code expect batched data and it was throwing a shape error. 我还添加了dataset.batch()因为其余的代码期望批处理数据,并且抛出了形状错误。 Prefetch just speed things up. 预取只是加快了速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM