如何在 Tensorflow-keras 中对 nlp 使用预测？

Question

I have bit of problem when predicting named entity recognition set.我在预测命名实体识别集时遇到了一些问题。 After i trained and tested all went good.经过我的培训和测试，一切顺利。 Now i want to test on raw data like strings .现在我想测试像字符串这样的原始数据。

I tried to use我试着用

model.predict(['Elon musk is good guy , he owns spacex, tesla.'])

but it throws erorr,但它抛出错误，

UnimplementedError:  Cast string to float is not supported
     [[node functional_29/Cast (defined at <ipython-input-210-e13dae4a124d>:1) ]] [Op:__inference_predict_function_223088]

Function call stack:
predict_function

I have token2index and我有 token2index 和

tag2index , dictionaries built from trained set. tag2index ，从训练集构建的字典。 I tried to convert it and use these but predicted shows 0 on all ,我试图转换它并使用这些但预测显示全部为 0，

word = ['Elon musk is good guy , he owns spacex, tesla.']
word_index = [[token2idx[word] for word in word]]
X = pad_sequences(sequences=word_index, maxlen=7, padding='post')
predicted = np.argmax(model.predict(X), axis=-1) 
print(predicted)

gives array([[0, 0, 0, 0, 0, 0, 0]]) which is not true.给出 array([[0, 0, 0, 0, 0, 0, 0]]) 这是不正确的。 Even tried a snippet sentence of x_train[0] but it throws like this.甚至尝试了 x_train[0] 的片段句子，但它会抛出这样的错误。 Thanks you for helping.谢谢你的帮助。

Answer 1

I guess you want to predict words, right?我猜你想预测单词，对吧？

Then you should split your words:那么你应该分开你的话：

sentence = 'Elon musk is good guy , he owns spacex, tesla.'
word_index = [[token2idx[word] for word in sentence.split(' ')]]
X = pad_sequences(sequences=word_index, maxlen=7, padding='post')
predicted = np.argmax(model.predict(X), axis=-1) 
print(predicted)

Update更新

As the discussion showed, the issue was in the model having a high accuracy during learning, but the output was always zero.正如讨论所示，问题在于模型在学习过程中具有很高的准确性，但输出始终为零。

Since your y-class sizes are not even distributed, the models learns, that improving the prediction for one class will improve the accuracy very fast very quickly.由于您的 y 类大小甚至没有分布，模型会学习到，改进一个类的预测将非常快速地提高准确性。 So your y data is something like this: [0,0,0,0,0,0,0,1,0,0,0,0,0,3,0] .所以你的 y 数据是这样的： [0,0,0,0,0,0,0,1,0,0,0,0,0,3,0] 。 With three classes : 0,1,3 the model learns quickly to predict zeros well, since that increases the accuracy the most.对于三个类： 0,1,3 ，模型可以快速学习以很好地预测零，因为这可以最大程度地提高准确性。 But但

the model does only learn to predict 0, which already gives it a high accuracy.该模型只学习预测 0，这已经给了它很高的准确性。 EG when one sequence contains of 20 words, so 20 y values and 19 are 0, the model will reach an accuracy of 95% by just predicting 0 all the time. EG 当一个序列包含 20 个单词时，因此 20 个 y 值和 19 个为 0，模型将通过始终预测 0 达到95%的准确度。 So a high accuracy is in that case no measure for the quality of the model, since for increasing the performance of the model for all classes, a jump from 95% to 98% does improve the model way more than the jump from 50% to 95%.因此，在这种情况下，高精度并不能衡量模型的质量，因为为了提高所有类别的模型性能，从 95% 跃升至 98% 确实比从 50% 跃升至 98% 对模型的提升更大95%。

如何在 Tensorflow-keras 中对 nlp 使用预测？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-28 09:20:19

如何在 Tensorflow-keras 中对 nlp 使用预测？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-28 09:20:19

解决方案1
0 已采纳 2020-08-28 09:20:19