简体   繁体   中英

Keras and sentiment analysis prediction

Good morning,

I trained a LSTM network on yelp https://www.yelp.com/dataset restaurants data set. It is a large dataset and it took several days to train on my PC. Anyways I saved the model and weights and now wish to use it for predictions for real time sentiment evaluations.

What is the common / good / best practice to do this: I load the model and the weights, I then compile it. This is not an issue there are plenty examples in the documentation or on the Internet. However what next? All I need to do is to tokenize the newly received review then pad it and pass to the model.predict?

tokenizer = Tokenizer(num_words = 2500, split=' ')
tokenizer.fit_on_texts(data['text'].values)
print(tokenizer.word_index)  
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X)

Cannot be that simple… If it is all what is required then how this is connected with the tokenizer that was used to train the model? It was an expensive operation to tokenize more than 2.5 milion reviews downloaded originally from yelp dataset?

Thank you for any suggestions.

You will want to save the Tokenizer and reuse it at inference time to make sure that your test sentence is decomposed into the correct integers. See this answer for an example on how to do this.

Yes, thank you worked perfectly. Just for completness of this thread:

I saved / loaded the tokenizer using:

import pickle

def save_tokenizer(file_path, tokenizer):
    with open(file_path, 'wb') as handle:
        pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

def load_tokenizer(file_path):
    with open(file_path, 'rb') as handle:
        tokenizer = pickle.load(handle)
    return tokenizer

Then used the tokenizer for predictions:

tokenizer = u.load_tokenizer("SavedModels/tokenizer.pcl")

X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X, maxlen = maxLength)
print(X)

model = u.load_model_from_prefix("single layer")
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])

prediction = model.predict(X)

print(prediction)
print(np.argmax(prediction))

Thanks for your help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM