简体   繁体   中英

How to import Keras Tokenizer to Java Deeplearning4j (DL4J)

I have implemented Text Classification of 20 News Group data using Keras (2.1.4 on TensorFlow). The accuracy is decent 0.87. I am also able to save the model and tokenizer and use them in another python program to predict the class of text file. Using below to save model and tokenizer-

# creates a HDF5 file 'my_model.h5'
model.model.save('my_model.h5')

# Save Tokenizer i.e. Vocabulary
with open('tokenizer.pickle', 'wb') as handle:
    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

If you need to refere complete code - http://www.opencodez.com/python/text-classification-using-keras.htm

Now, I am looking to import Keras trained model and tokenizer into Java Web Application . Deeplearning4j provides an option to load Keras model with

MultiLayerNetwork network = KerasModelImport.importKerasSequentialModelAndWeights("PATH TO YOUR H5 FILE")

But I could not find any option to load Tokenizer or its metadata.

As per my limited understanding, you would need the model and saved vocabulary metadata (tokenizer) to predict accurately.

Any help or pointers to achieve this are much appreciated.

You need an equivalent tokenizer, manage the vocabulary and vectorize in order to preprocess the text data and feed the model. You can easily create one tokenizer that mimics the Python one using Java Regex . You can also check the Stanford NLP Group software in Java and Apache OpenNLP .

For vectorization, check the Deeplearning4J's DataVec , vectorization and ETL (Extract Transform Load) Java library. Maybe more interesting, check the Deeplearning4J NLP Functionality .

另一种方法可以是在Python中创建一个分类器Web服务(使用Flask或其他Python Web框架),并公开对该Web服务的调用,以与基于Java的Web应用程序交换JSON / XML数据。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM