简体   繁体   English

Keras和情绪分析预测

[英]Keras and sentiment analysis prediction

Good morning, 早上好,

I trained a LSTM network on yelp https://www.yelp.com/dataset restaurants data set. 我在yelp https://www.yelp.com/dataset餐厅数据集上训练了一个LSTM网络。 It is a large dataset and it took several days to train on my PC. 这是一个很大的数据集,在我的PC上花了几天时间进行训练。 Anyways I saved the model and weights and now wish to use it for predictions for real time sentiment evaluations. 无论如何,我保存了模型和权重,现在希望将其用于实时情感评估的预测。

What is the common / good / best practice to do this: I load the model and the weights, I then compile it. 这样做的通用/良好/最佳实践是什么:我加载模型和权重,然后对其进行编译。 This is not an issue there are plenty examples in the documentation or on the Internet. 这不是问题,文档中或Internet上有很多示例。 However what next? 但是接下来呢? All I need to do is to tokenize the newly received review then pad it and pass to the model.predict? 我要做的就是标记新收到的评论,然后填充并传递给model.predict?

tokenizer = Tokenizer(num_words = 2500, split=' ')
tokenizer.fit_on_texts(data['text'].values)
print(tokenizer.word_index)  
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X)

Cannot be that simple… If it is all what is required then how this is connected with the tokenizer that was used to train the model? 不可能那么简单……如果只需要所有这些,那么这与用于训练模型的令牌生成器如何关联? It was an expensive operation to tokenize more than 2.5 milion reviews downloaded originally from yelp dataset? 令牌化最初从yelp数据集下载的超过250万条评论的标记是一项昂贵的操作?

Thank you for any suggestions. 感谢您的任何建议。

You will want to save the Tokenizer and reuse it at inference time to make sure that your test sentence is decomposed into the correct integers. 您将需要保存Tokenizer并在推断时重用它,以确保将测试语句分解为正确的整数。 See this answer for an example on how to do this. 请参阅此答案以获取有关如何执行此操作的示例。

Yes, thank you worked perfectly. 是的,谢谢您的完美配合。 Just for completness of this thread: 仅出于此线程的完整性:

I saved / loaded the tokenizer using: 我使用以下命令保存/加载了令牌生成器:

import pickle

def save_tokenizer(file_path, tokenizer):
    with open(file_path, 'wb') as handle:
        pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

def load_tokenizer(file_path):
    with open(file_path, 'rb') as handle:
        tokenizer = pickle.load(handle)
    return tokenizer

Then used the tokenizer for predictions: 然后使用分词器进行预测:

tokenizer = u.load_tokenizer("SavedModels/tokenizer.pcl")

X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X, maxlen = maxLength)
print(X)

model = u.load_model_from_prefix("single layer")
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])

prediction = model.predict(X)

print(prediction)
print(np.argmax(prediction))

Thanks for your help. 谢谢你的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM