简体   繁体   中英

NLTK TweetTokenizer not working (Python)

I have currently installed NLTK and have run the command nltk.download() . However not all the libraries are installed (it gets stuck on panlex_lite).

The thing is that when I try to import Tweet Tokenizer I get the error:

File "create_docs.py", line 7, in

 from nltk.tokenize import TweetTokenizer ImportError: cannot import 

name TweetTokenizer

How can I deal with this? Cheers!

This is because is not installed properly libraries, so need to skip "panlex_lite" libraries and should work.

Currently is open issue for this, solution will be as follow:

I guess, we could add something like if id != 'panlex_lite' to the code...

But, as for me, the easiest way looks like this:

get https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
remove panlex from it
upload it to a public Gist
pass the gist's url to the downloader: python -m nltk.downloader -d /usr/local/share/nltk_data -u https://gist.githubusercontent.com/demidovakatya/61dab385d74065ae825c80496a197980/raw/c6ff7fbf44265c7f8c9e961e3e1158cd812d6af1/index.xml all

here is the link to issue: look at last 2 conversations

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM