简体   繁体   English

有没有办法使用 kmeans,tensorflow 在 bigquery 中保存 model?

[英]Is there a way to use a kmeans, tensorflow saved model in bigquery?

I know this is kind of stupid since BigQueryML now provides Kmeans with good initialization.我知道这有点愚蠢,因为 BigQueryML 现在为 Kmeans 提供了良好的初始化。 N.netheless I was required to train a model in tensorflow and then pass it to BigQuery for prediction. N.nettheless 我被要求在 tensorflow 中训练一个 model,然后将其传递给 BigQuery 进行预测。

I saved my model and everything works fine, until I try to upload it to bigquery.我保存了我的 model 并且一切正常,直到我尝试将它上传到 bigquery。 I get the following error:我收到以下错误:

TensorFlow SavedModel output output has an unsupported shape: unknown_rank: true

So my question is: Is it impossible to use a tensorflow trained kmeans algorithm in BigQuery?所以我的问题是:是否不可能在 BigQuery 中使用经过 tensorflow 训练的 kmeans 算法?

Edit :编辑

Creating the model:创建 model:

kmeans = tf.compat.v1.estimator.experimental.KMeans(num_clusters=8, use_mini_batch = False,    initial_clusters=KMEANS_PLUS_PLUS_INIT, seed=1234567890, relative_tolerance=.001)

Serving function:服务 function:

def serving():
    inputs = {}
   # for feat in df.columns:
   #     inputs[feat] = tf.placeholder(shape=[None], dtype = tf.float32)
    inputs = tf.placeholder(shape=[None,9], dtype = tf.float32)
    return tf.estimator.export.TensorServingInputReceiver(inputs,inputs)

Saving the model:保存 model:

kmeans.export_saved_model("gs://<bicket>/tf_clustering_model", 
                          serving_input_receiver_fn=serving,
                          checkpoint_path='/tmp/tmpdsleqpi3/model.ckpt-19',
                          experimental_mode=tf.estimator.ModeKeys.PREDICT)

Loading to BigQuery:加载到 BigQuery:

query="""
CREATE MODEL `<project>.<dataset>.kmeans_tensorflow` OPTIONS(MODEL_TYPE='TENSORFLOW', MODEL_PATH='gs://<bucket>/tf_clustering_model/1581439348/*')
"""
job = bq.Client().query(query)
job.result()

Edit2 :编辑2

The output of the saved_model_cli command is the following: saved_model_cli命令的output如下:

jupyter@tensorflow-20200211-182636:~$ saved_model_cli  show --dir . --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['all_distances']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 9)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_FLOAT
        shape: unknown_rank
        name: add:0
  Method name is: tensorflow/serving/predict

signature_def['cluster_index']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 9)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_INT64
        shape: unknown_rank
        name: Squeeze_1:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 9)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_INT64
        shape: unknown_rank
        name: Squeeze_1:0
  Method name is: tensorflow/serving/predict

All seem to have unknown rank for the output shapes.对于 output 形状,所有形状似乎都有未知等级。 How can I set up the export of this particular estimator or is there something I can search to help me?我如何设置这个特定估算器的导出或者有什么我可以搜索来帮助我的东西?

Final Edit:最终编辑:

This really seems to be unsupported at least as far as I can take it.至少就我而言,这似乎真的不受支持。 My approaches varied, but at the end of the day, I saw myself without much more choice than get the code from the source of the KmeansClustering class (and the remaining code from github ) and attempt to reshape the outputs somehow.我的方法各不相同,但最终,我发现自己别无选择,只能从KmeansClustering class 的源代码(以及 github 的其余代码)中获取代码并尝试以某种方式重塑输出。 In the process, I realized the object of the results, was actually a tuple with some different Tensor class, that seemed to be used to construct the graphs alone.在这个过程中,我意识到结果的 object 实际上是一个包含一些不同 Tensor class 的元组,似乎是用来单独构建图形的。 Interesting enough, if I took this tuple and did something like:有趣的是,如果我拿这个元组做类似的事情:

model_predictions[0][0]...[0]

the object was always some weird Tensor. object 总是一些奇怪的张量。 I went up to sixty something in the three dots and eventually gave up.我在三个点中上升到 60 左右,最终放弃了。

From there I tried to get the class that was giving these outputs to KmeansClustering called Kmeans in clustering ops (and surrounding code in github ).从那里我试图获得 class,它在集群操作中将这些输出提供给称为 Kmeans 的 KmeansClustering(以及github中的周围代码)。 Again I had no success in changing the datatype, but I did understood why the name of the output was set to Squeeze something: in here the output had a squeeze operation.我再次没有成功更改数据类型,但我确实理解了为什么 output 的名称被设置为 Squeeze something:在这里 output 有一个挤压操作。 I thought this could be the problem and attempted to remove the squeeze operation among other things... I failed:(我认为这可能是问题所在,并试图在其他事情中删除挤压操作......我失败了:(

Finally I realized that this output seemed to actually come from the estimator.py file and at this point I just gave up on it.最后我意识到这个 output 似乎实际上来自 estimator.py 文件,此时我只是放弃了它。

Thank you to all who commented, I would not have come this far, Cheers感谢所有评论的人,我不会走到这一步,干杯

You can check the shape in the savedmodel file by using the command line program saved_model_cli that ships with tensorflow.您可以使用tensorflow 附带的命令行程序saved_model_cli 检查savedmodel 文件中的形状。

Make sure your export signature in tensorflow specifies the shape of the output tensor.确保您在 tensorflow 中的导出签名指定了输出张量的形状。

What this error means: The TF model output named "output" is of completely undefined shape.此错误的含义:名为“output”的 TF 模型输出的形状完全未定义。 (unknown_rank=true means that the model isn't even specifying a number of dimensions). (unknown_rank=true 意味着该模型甚至没有指定多个维度)。

For BigQuery to be able to use the TensorFlow model it has to be able to convert the model output into a BigQuery type: Either a single primitive scalar or one-dimensional array of primitives.为了让 BigQuery 能够使用 TensorFlow 模型,它必须能够将模型输出转换为 BigQuery 类型:单个原始标量或一维原始数组。

You may be able to add a tf.reshape operation at the end of the graph to shape this output into something that BigQuery can load.您可以在图表末尾添加tf.reshape操作,以将此输出整形为 BigQuery 可以加载的内容。

It's not obvious what your KMeans model is outputting.您的 KMeans 模型输出的内容并不明显。 I'm guessing it might be trying to output all of the clusters as one big tensor?我猜它可能试图将所有集群输出为一个大张量? Was this a model created using the TensorFlow KMeans Estimator ?这是使用TensorFlow KMeans Estimator创建的模型吗?

The main issue is that the output tensor shape of TF built-in KMeans estimator model has unknown rank in the saved model.主要问题是 TF 内置 KMeans 估计器 model 的 output 张量形状在已保存的 model 中具有未知等级。

Two possible ways to solve this:解决此问题的两种可能方法:

  • Try training the KMeans model on BQML directly.尝试直接在 BQML 上训练 KMeans model。
  • Reimplement the TF KMeans estimator model to reshape the output tensor into a specific tensor shape.重新实现 TF KMeans 估计器 model 以将 output 张量重塑为特定的张量形状。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM