简体   繁体   中英

error installing nltk supporting packages : nltk.download()

I have installed the nltk package. Following that I am trying to download the supporting packages using nltk.download() and am getting error:

[Errno 11001] getaddrinfo

My machine / software details are:

OS: Windows 8.1 Python: 3.3.4 NLTK Package: 3.0

Below are the commands run in python:

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.

import nltk

nltk.download()
showing info http://nltk.github.com/nltk_data/
True

nltk.download("all")
[nltk_data] Error loading all: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
False

在此处输入图像描述

It looks like it is going to http://nltk.github.com/nltk_data/ whereas it should Ideally try to get the data from http://www.nltk.org/nltk_data/ .

On another machine when we type http://nltk.github.com/nltk_data/ in the browser, it redirects to http://www.nltk.org/nltk_data/ . I am not understanding why the redirection is not happening on my laptop.

I feel that this might be the issue.

Kindly help.

I have added the command prompt screenshot. Need help..

在此处输入图像描述

Regards, Bonson

Try below code. It has downloaded package as expected

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

Looks before link was broken whicvh been fixed by ssl.

Note :- MAC been used

I got this error because of network constraint. Here is how I solved

Browsed http://www.nltk.org/nltk_data/ and downloaded required corpora from the corresponding link.

Then placed the downloaded files in C:/ folder path in windows (or any other relevant directories like C:/ProgramData/Anaconda3 ) in a same folder structure mentioned in https://github.com/nltk/nltk_data/tree/gh-pages/packages

Got the solution. The issue in my case was that when the NLTK downloader started it had the server index as - http://nltk.github.com/nltk_data/

This needs to be changed to - http://nltk.org/nltk_data/

You can change this by going into the NLTK Downloader window and the File->Change Server Index.

Regards, Bonson

it resolved issues for me by "setting http & https proxy in environment variables"

set http_proxy=http://IPN:PWD@ipaddress:port
set https_proxy=https://IPN:PWD@ipaddress:port

ask your network or admin team for this proxy IP address

The Error might be of the proxy that the system has. Refer the following link for the answer, have posted the answer there:

Error in downloading NLTK data: [Errno 11004] getaddrinfo failed

I was facing this issue on my Jupyter notebook as well. The below code snippet from another stackoverflow answer helped. Just in case it might help someone else -

import socket
socket.getaddrinfo('localhost', 8080)

Ref : "getaddrinfo failed", what does that mean?

We also have an option to download the packages using python prompt or from within notebooks with following config. It can be http or https based on your proxy settings.

import nltk
nltk.set_proxy('http://username:password@proxy.example.com:port')

The problem is with Jio Network (both SIM and Fiber), try downloading from other Inte.net Service Provider viz. Airtel, BSNL or other.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM