[英]How to add NLTK corpora to a google cloud function?
I am trying to run Google cloud function which involves the use of NLTK. 我正在尝试运行涉及NLTK使用的Google云功能。 I had added textblob == 0.15.3 nltk == 3.4.3 to the requirement.txt. 我已经将textblob == 0.15.3 nltk == 3.4.3添加到了require.txt文件中。 But every time I run the script it crashes and the log shows "Please use the NLTK Downloader to obtain the resource:". 但是,每次我运行该脚本时,它都会崩溃,并且日志显示“请使用NLTK下载器获取资源:”。
I know we need to download NLTK corpora to run the script in a local system. 我知道我们需要下载NLTK语料库才能在本地系统中运行脚本。 But not sure of how to download it in Google Cloud Functions. 但不确定如何在Google Cloud Functions中下载它。 Any help will be greatly appreciated. 任何帮助将不胜感激。 Thanks in advance. 提前致谢。
There are two ways to specify dependencies for Cloud Functions written in Python: using the pip package manager's requirements.txt file or packaging local dependencies alongside your function. 有两种方法可以指定使用Python编写的Cloud Functions的依赖关系:使用pip包管理器的requirements.txt文件或将本地依赖关系与您的函数一起打包。 There you can find instructions. 在那里您可以找到说明。 Also check this link for possible solution. 另请检查此链接以获取可能的解决方案。
This is how I get the nltk_data through my Travis Pipeline: 这就是我通过Travis Pipeline获取nltk_data的方式:
# To install the core NLTK package
pip install nltk
# Installs only the extra packages you need. You could also use 'all' instead.
python -m nltk.downloader punkt averaged_perceptron_tagger wordnet
Then you can copy the folder into your function folder, and zip it up: 然后,您可以将文件夹复制到功能文件夹中,并将其压缩:
mkdir -p function/nltk_data/
cp -a ~/nltk_data/. function/nltk_data/
cp -a path/to/your/code/. function/
Be sure to set the NLTK_DATA environment variable. 确保设置NLTK_DATA环境变量。 As my folder structure was 由于我的文件夹结构是
- nltk_data/
- main.py
- requirements.txt
I just needed to set NLTK_DATA=nltk_data, and then python can find the files. 我只需要设置NLTK_DATA = nltk_data,然后python可以找到文件。
Hope this helps! 希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.