[英]python nltk download from external url
A server I am running nltk tests from does not have direct access to the external nltk models at http://www.nltk.org/nltk_data/
but we do have a private mirror setup to access the nltk models. 我正在从中运行nltk测试的服务器无法直接访问位于http://www.nltk.org/nltk_data/
的外部nltk模型,但是我们确实有一个专用的镜像设置来访问nltk模型。
How can I tell the ntlk downloader to install from the private mirror as opposed to http://www.nltk.org/nltk_data/ ? 我该如何告诉ntlk下载程序从私有镜像而不是http://www.nltk.org/nltk_data/安装 ?
I was expecting this to work but does not: 我原以为这会起作用,但不会:
>>> nltk.downloader.Downloader(server_index_url='https://MyNltkMirror/index.xml').download()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d
Download which package (l=list; x=cancel)?
Identifier> abc
Downloading package abc to /path/to/nltk_data...
Error downloading 'abc' from
<https://raw.githubusercontent.com/nltk/nltk_data/gh-
pages/packages/corpora/abc.zip>: <urlopen error [Errno 104]
Connection reset by peer>
Or is it possible that I am doing this right and there is an access issue connecting to raw.githubusercontent.com from my server? 还是我可以正确执行此操作,并且从服务器连接到raw.githubusercontent.com时出现访问问题?
Thanks. 谢谢。
Try downloading the package(s) without using the interactive mode. 尝试不使用交互模式就下载软件包。
# Your mirror.
mirror_url = "http://example.com/my_corpus_data/index.xml"
dler = nltk.downloader.Downloader(mirror_url)
# Directly download the package(s) without using the interactive mode.
dler.download('popular')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.