NLTK：包错误？朋克和泡菜？

Question

命令提示符错误

Basically, I have no idea why I'm getting this error.基本上，我不知道为什么会出现此错误。

Just to have more than an image, here is a similar message in code format.不仅仅是一张图片，这里还有一个类似的代码格式的消息。 As it is more recent, the answer of this thread has already been mentioned in the message:由于它是最近的，消息中已经提到了该线程的答案：

Preprocessing raw texts ...

---------------------------------------------------------------------------

LookupError                               Traceback (most recent call last)

<ipython-input-38-263240bbee7e> in <module>()
----> 1 main()

7 frames

<ipython-input-32-62fa346501e8> in main()
     32     data = data.fillna('')  # only the comments has NaN's
     33     rws = data.abstract
---> 34     sentences, token_lists, idx_in = preprocess(rws, samp_size=samp_size)
     35     # Define the topic model object
     36     #tm = Topic_Model(k = 10), method = TFIDF)

<ipython-input-31-f75213289788> in preprocess(docs, samp_size)
     25     for i, idx in enumerate(samp):
     26         sentence = preprocess_sent(docs[idx])
---> 27         token_list = preprocess_word(sentence)
     28         if token_list:
     29             idx_in.append(idx)

<ipython-input-29-eddacbfa6443> in preprocess_word(s)
    179     if not s:
    180         return None
--> 181     w_list = word_tokenize(s)
    182     w_list = f_punct(w_list)
    183     w_list = f_noun(w_list)

/usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in word_tokenize(text, language, preserve_line)
    126     :type preserver_line: bool
    127     """
--> 128     sentences = [text] if preserve_line else sent_tokenize(text, language)
    129     return [token for sent in sentences
    130             for token in _treebank_word_tokenizer.tokenize(sent)]

/usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in sent_tokenize(text, language)
     92     :param language: the model name in the Punkt corpus
     93     """
---> 94     tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
     95     return tokenizer.tokenize(text)
     96 

/usr/local/lib/python3.7/dist-packages/nltk/data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding)
    832 
    833     # Load the resource.
--> 834     opened_resource = _open(resource_url)
    835 
    836     if format == 'raw':

/usr/local/lib/python3.7/dist-packages/nltk/data.py in _open(resource_url)
    950 
    951     if protocol is None or protocol.lower() == 'nltk':
--> 952         return find(path_, path + ['']).open()
    953     elif protocol.lower() == 'file':
    954         # urllib might not use mode='rb', so handle this one ourselves:

/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)
    671     sep = '*' * 70
    672     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 673     raise LookupError(resource_not_found)
    674 
    675 

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
    - ''
**********************************************************************

Answer 1

Perform the following:执行以下操作：

>>> import nltk
>>> nltk.download()

Then when you receive a window popup, select punkt under the identifier column which is locatedin the Module tab.然后，当您收到一个弹出窗口时，在位于“ Module选项卡中的identifier列下选择punkt 。

Answer 2

Do :做：

>>> import nltk
>>> nltk.download('punkt')
>>> from nltk import sent_tokenize

To download all dataset and models :下载所有数据集和模型：

>>> nltk.download('all')

Ensure that you've the latest version of NLTK because it's always improving and constantly maintain:确保您拥有最新版本的NLTK因为它一直在改进和不断维护：

$ pip install --upgrade nltk

Similar question on Windows/Linux but with the above code snippet don't help : Windows/Linux 上的类似问题，但上面的代码片段没有帮助：

Similar question on Windows : Windows 上的类似问题：

NLTK and Stopwords Fail #lookuperror NLTK 和停用词失败 #lookuperror

Similar question on Linux platform : Linux平台上的类似问题：

-

——

Similar question on OSX : OSX 上的类似问题：

nltk.download() hangs on OS X nltk.download() 在 OS X 上挂起

Similar question but has some authorization/authentication errors :类似的问题，但有一些授权/身份验证错误：

-

——

Similar question that tries to install NLTK data outside of python interpreter :尝试在 python 解释器之外安装 NLTK 数据的类似问题：

Installing nltk data in setup.py script 在 setup.py 脚本中安装 nltk 数据

Answer 3

To pre-install the punkt package with a single command line python -c 'import nltk; nltk.download("punkt")'使用单个命令行预安装 punkt 包python -c 'import nltk; nltk.download("punkt")' python -c 'import nltk; nltk.download("punkt")' . python -c 'import nltk; nltk.download("punkt")' 。

NLTK：包错误？朋克和泡菜？

问题描述

3 个解决方案

解决方案1
13 已采纳 2015-06-13 18:35:13

解决方案2
9 2015-06-13 19:55:09

解决方案3
2 2020-08-27 20:51:06

NLTK：包错误？ 朋克和泡菜？

问题描述

3 个解决方案

解决方案1 13 已采纳 2015-06-13 18:35:13

解决方案2 9 2015-06-13 19:55:09

解决方案3 2 2020-08-27 20:51:06

NLTK：包错误？朋克和泡菜？

解决方案1
13 已采纳 2015-06-13 18:35:13

解决方案2
9 2015-06-13 19:55:09

解决方案3
2 2020-08-27 20:51:06