简体   繁体   中英

How to use Stanford NLP tools with NLTK in Heroku?

I am developing a chatbot (for Kik messenger) with Python and recently moved my app to Heroku, pretty much as described in this question . Additionally, I have included NLTK (a Python module) and some of its resources as described in the Heroku documentation . Up to this point, things work nicely and the chatbot app responds in the Kik messenger.

As a next step, I want to include tools from Stanford NLP with their NLTK API. The Stanford NLP tools are provided as a Java repository, together with several model files. Locally, I have done this after setting up the API according to this answer . I don't know how to do this for Heroku, though. Heroku has a documentation on how to deploy executable jar files, but I don't see how to apply it to my problem.

The actual function I want to use is the Stanford parser that I invoke locally with:

from nltk.parse.stanford import StanfordParser
parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")

This is my first question on SO, so please let me know if and how I can edit this question so that it becomes easier to answer.

Edit: On a more general level, I have a Python application that I run on the Heroku cloud service (with ephemeral file system) and want to include a Java repository.

You'll need to include the JAR files in your app by downloading them at build time. It sounds from the answer you linked to that you can do this with something like:

import nltk
nltk.download()

You'll also need to add the JVM buildpack to your app:

$ heroku buildpacks:add heroku/jvm

In my case, worked with deleting unnecessary class files in model.jar file. use this code in the stanford-parser directory and make jar file less than 100 MB that is limit per push of github.

jar tf stanford-parser-3.6.0-models.jar

and delete unnecessary class files with this command

zip -d stanford-parser-3.6.0-models.jar edu/stanford/path/to/file

and push your files to github and deploy to your app.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM