简体   繁体   English

如何在Heroku中将Stanford NLP工具与NLTK一起使用?

[英]How to use Stanford NLP tools with NLTK in Heroku?

I am developing a chatbot (for Kik messenger) with Python and recently moved my app to Heroku, pretty much as described in this question . 我正在使用Python开发一个聊天机器人(用于Kik Messenger),并且最近将我的应用程序移动到了Heroku,这与本问题中描述的差不多。 Additionally, I have included NLTK (a Python module) and some of its resources as described in the Heroku documentation . 另外,我还包括了NLTK(Python模块)及其一些资源,如Heroku文档中所述 Up to this point, things work nicely and the chatbot app responds in the Kik messenger. 到目前为止,一切运行良好,并且聊天机器人应用程序在Kik Messenger中进行了响应。

As a next step, I want to include tools from Stanford NLP with their NLTK API. 下一步,我想将Stanford NLP的工具及其NLTK API包括在内。 The Stanford NLP tools are provided as a Java repository, together with several model files. Stanford NLP工具与几个模型文件一起作为Java存储库提供。 Locally, I have done this after setting up the API according to this answer . 在本地,我已经根据此答案设置了API之后执行了此操作 I don't know how to do this for Heroku, though. 我不知道如何为Heroku做到这一点。 Heroku has a documentation on how to deploy executable jar files, but I don't see how to apply it to my problem. Heroku上有一个文档上如何部署可执行的JAR文件,但我不明白如何将它应用到我的问题。

The actual function I want to use is the Stanford parser that I invoke locally with: 我要使用的实际功能是我在本地调用的斯坦福解析器:

from nltk.parse.stanford import StanfordParser
parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")

This is my first question on SO, so please let me know if and how I can edit this question so that it becomes easier to answer. 这是我关于SO的第一个问题,所以请让我知道是否以及如何编辑此问题,以使其更易于回答。

Edit: On a more general level, I have a Python application that I run on the Heroku cloud service (with ephemeral file system) and want to include a Java repository. 编辑:在更一般的级别上,我有一个在Heroku云服务(带有临时文件系统)上运行的Python应用程序,并且想要包含Java存储库。

You'll need to include the JAR files in your app by downloading them at build time. 您需要在构建时通过下载将JAR文件包含在您的应用中。 It sounds from the answer you linked to that you can do this with something like: 从您所链接的答案中可以看出,您可以使用以下方法执行此操作:

import nltk
nltk.download()

You'll also need to add the JVM buildpack to your app: 您还需要将JVM buildpack添加到您的应用程序:

$ heroku buildpacks:add heroku/jvm

In my case, worked with deleting unnecessary class files in model.jar file. 就我而言,删除了model.jar文件中不必要的类文件。 use this code in the stanford-parser directory and make jar file less than 100 MB that is limit per push of github. 在stanford-parser目录中使用此代码,并使jar文件小于100 MB,这是对github每次推送的限制。

jar tf stanford-parser-3.6.0-models.jar

and delete unnecessary class files with this command 并使用此命令删除不必要的类文件

zip -d stanford-parser-3.6.0-models.jar edu/stanford/path/to/file

and push your files to github and deploy to your app. 并将您的文件推送到github并部署到您的应用程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM