简体   繁体   English

有什么方法可以将 Python 的 nltk.download('punkt') 导入 Google Cloud Functions?

[英]Any way to import Python's nltk.download('punkt') into Google Cloud Functions?

Any way to import Python's nltk.download('punkt') into Google Cloud Functions?有什么方法可以将 Python 的 nltk.download('punkt') 导入 Google Cloud Functions? I've found that adding the statement manually into my code block in main.py significantly slows down my function processing, since punkt has to be downloaded every time it is run.我发现手动将语句添加到 main.py 的代码块中会显着减慢 function 的处理速度,因为每次运行时都必须下载 punkt。 Is there any method to eliminate this by calling punkt in some other way?有什么方法可以通过以其他方式调用 punkt 来消除这种情况吗?

EDIT#1:- I edited my code and program structure to match what Barak suggested, but I keep getting the same error: EDIT#1:- 我编辑了我的代码和程序结构以匹配 Barak 的建议,但我一直收到同样的错误:

Error: function terminated. Recommended action: inspect logs for termination reason. Details:

**********************************************************************
  Resource [93mpunkt[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m

  Searched in:
    - '/tmp/nltk_data'
    - '/env/nltk_data'
    - '/env/share/nltk_data'
    - '/env/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

Add nltk to your requirements.txt ;将 nltk 添加到您的requirements.txt

Install nltk on your local machine, if you haven't already:在本地机器上安装 nltk,如果你还没有:

pip install nltk

Then download the nltk_data files.然后下载 nltk_data 文件。 In my case for tokenizers, I needed the Punkt tokenizer module:对于分词器,我需要Punkt分词器模块:

python -m nltk.downloader punkt  

Copy them (they're inside Roaming/ for Windows) to your root folder (ie together with your functions):将它们(它们在 Roaming/ for Windows 中)复制到您的根文件夹(即与您的函数一起):

cp -r C:\Users\<USER>\AppData\Roaming\nltk_data\* YOUR\ROOT\FOLDER\nltk_data\       

At the beginning of your main python function, or just before using nltk, add the following code--Basically, it grabs the path where nltk_data is, and tells nltk to look inside this folder:在你的主要 python function 的开头,或者在使用 nltk 之前,添加以下代码 - 基本上,它获取 nltk_data 所在的路径,并告诉nltk查看这个文件夹:

  root = os.path.dirname(os.path.abspath(__file__))
  download_dir = os.path.join(root, 'nltk_data')
  os.chdir(download_dir)
  nltk.data.path.append(download_dir)

Finally, after committing/pushing (if you're using Cloud Source Repos), (re)deploy your function!最后,在提交/推送之后(如果您使用 Cloud Source Repos),(重新)部署您的函数!

Take a look at the instructions for uploading files with your Cloud function .查看使用 Cloud function 上传文件的说明。 Specifically since you can upload files, you can then modify nltk to just use these files:具体来说,由于您可以上传文件,因此您可以修改 nltk 以仅使用这些文件:

Following the official NLTK documentation , you can "Set your NLTK_DATA environment variable to point to your top level nltk_data folder."按照官方 NLTK 文档,您可以“将您的 NLTK_DATA 环境变量设置为指向您的顶级 nltk_data 文件夹。”

Combining these together, you'd get:将这些结合在一起,你会得到:

  1. Download the data (on your computer) with python -m nltk.downloader punkt使用python -m nltk.downloader punkt下载数据(在您的计算机上)
  2. Upload the NLTK directory (find it's path on your computer in the above documentation) as an nltk_data directory, created at the root of your function environment将 NLTK 目录(在上述文档中找到您计算机上的路径)上传为nltk_data目录,在 function 环境的根目录下创建
  3. Configure the code to find that folder:配置代码以找到该文件夹:

     import os root = os.path.dirname(path.abspath(__file__)) nltk_dir = os.path.join(root, 'nltk_data') # Your folder name here os.environ['NLTK_DATA'] = nltk_dir

EDIT: Seems as if path export with the environment variable doesn't achieve the desired effect, so let's have the path explicit in the code编辑:似乎使用环境变量的路径导出没有达到预期的效果,所以让我们在代码中明确路径

  1. On your computer download the data在您的计算机上下载数据

    import os download_dir = os.path.abspath('my_nltk_dir') os.makedirs(download_dir) import nltk nltk.download('punkt', download_dir=download_dir)
  2. Add the directory my_nltk_dir to be in the same folder of your python script.将目录my_nltk_dir添加到 python 脚本的同一文件夹中。 This would be这将是

    PROJECT_ROOT/ |-- my_code.py |-- my_nltk_dir/ |--...
  3. In your code refer to the data using在您的代码中使用

    import ntlk.data root = os.path.dirname(path.abspath(__file__)) download_dir = os.path.join(root, 'my_nltk_dir') nltk.data.load( os.path.join(download_dir, 'tokenizers/punkt/english.pickle') )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM