简体   繁体   English

从外部URL下载python nltk

[英]python nltk download from external url

A server I am running nltk tests from does not have direct access to the external nltk models at http://www.nltk.org/nltk_data/ but we do have a private mirror setup to access the nltk models. 我正在从中运行nltk测试的服务器无法直接访问位于http://www.nltk.org/nltk_data/的外部nltk模型,但是我们确实有一个专用的镜像设置来访问nltk模型。

How can I tell the ntlk downloader to install from the private mirror as opposed to http://www.nltk.org/nltk_data/ ? 我该如何告诉ntlk下载程序从私有镜像而不是http://www.nltk.org/nltk_data/安装

I was expecting this to work but does not: 我原以为这会起作用,但不会:

>>> nltk.downloader.Downloader(server_index_url='https://MyNltkMirror/index.xml').download()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> abc
    Downloading package abc to /path/to/nltk_data...
    Error downloading 'abc' from
        <https://raw.githubusercontent.com/nltk/nltk_data/gh-
        pages/packages/corpora/abc.zip>:   <urlopen error [Errno 104]
        Connection reset by peer>

Or is it possible that I am doing this right and there is an access issue connecting to raw.githubusercontent.com from my server? 还是我可以正确执行此操作,并且从服务器连接到raw.githubusercontent.com时出现访问问题?

Thanks. 谢谢。

Try downloading the package(s) without using the interactive mode. 尝试不使用交互模式就下载软件包。

# Your mirror.
mirror_url = "http://example.com/my_corpus_data/index.xml"
dler = nltk.downloader.Downloader(mirror_url)

# Directly download the package(s) without using the interactive mode.
dler.download('popular')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM