简体   繁体   English

Keras model.predict函数给出输入形状错误

[英]Keras model.predict function giving input shape error

I have implemented universal sentence encoder in Tensorflow and now I am trying to predict the class probabilities on a sentence. 我已经在Tensorflow中实现了通用句子编码器,现在我正在尝试预测句子的类概率。 I am converting the string to an array as well. 我也将字符串转换为数组。

Code: 码:

if model.model_type == "universal_classifier_basic":
    class_probs = model.predict(np.array(['this is a random sentence'], dtype=object)

Error Message: 错误信息:

InvalidArgumentError (see above for traceback): input must be a vector, got shape: []
     [[Node: lambda_1/module_apply_default/tokenize/StringSplit = StringSplit[skip_empty=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lambda_1/module_apply_default/RegexReplace_1, lambda_1/module_apply_default/tokenize/Const)]]

Any leads, suggestions or explanations are welcomed and highly appreciated. 任何线索,建议或解释均受到欢迎和高度赞赏。 Thank You :) 谢谢 :)

it is not that easy as you would like. 这不是您想要的那么容易。 Usually a model expects a vector of integer as input. 通常,模型期望输入一个整数向量。 Each integer represent the index of the correspondent word in a vocabulary. 每个整数表示词汇表中相应单词的索引。 For example 例如

vocab = {"hello":0, "world":1}

and you want to give as input the sentence "hello world" to the network then you should build the vector as follow: 并且您想要将句子“ hello world”输入网络,那么您应该按照以下步骤构建向量:

net_input = [vocab.get(word) for word in "hello world".split(" ")]

Note also that, if you trained the network with mini batch then you will also need to add an extra first dimension to the vector you want to feed to the network. 还要注意,如果您使用小批量训练网络,则还需要向要馈入网络的向量添加额外的第一维。 You can easily do this with numpy: 您可以使用numpy轻松做到这一点:

import numpy as np
net_input = np.expand_dims(net_input, 0)

In this way your net_input have the shape [1, 2] and you can feed it into the network. 这样,您的net_input的形状为[1、2],您可以将其输入网络。

There is still a problem that could stop you to feed the network with such a vector. 仍然存在一个问题,该问题可能会阻止您使用这样的媒介向网络供电。 At training time you have probably defined a placeholder for the input that has a precise len (30, 40 tokens). 在训练时,您可能已经为输入定义了占位符,该占位符具有精确的len(30、40个令牌)。 At test time you need to match that size at cost of padding your sentence if it doesn't feel the whole length or to cut it if it is longer. 在测试时,如果感觉不到整个长度,则需要以填补该句子的大小为代价来匹配该大小,或者如果更长则切掉该句子。

You can truncate or add padding as follow: 您可以截断或添加填充,如下所示:

net_input = [old_in[:max_len] + [vocab.get("PAD")] * (max_len - len(old_in[:max_len])] for old_in in net_input]

This line of code truncate the input if necessary old_in[:max_len] to the maximum possible len (note that python won't do anything if the len was less than max_len) and fill the difference between max len and the real len ( (max_len - len(old_in[:max_len]) ) slots with padding tokens ( + [vocab.get("PAD")] ) 这行代码将必要时将old_in[:max_len]截断为最大可能的len(请注意,如果len小于max_len,python将不执行任何操作),并填充max len和实际len( (max_len - len(old_in[:max_len])具有填充标记的(max_len - len(old_in[:max_len]) )插槽( + [vocab.get("PAD")]

Hope this helps. 希望这可以帮助。

If this is not the case you are in, just write down a comment to the answer and I'll try to figure out other solutions. 如果您不是这种情况,请在答案中写下评论,我将尝试找出其他解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM