簡體   English   中英

nltk PlaintextCorpusReader發送和paras函數不起作用

[英]nltk PlaintextCorpusReader sents and paras functions not working

我無法在PlaintextCorpusReader中使paras和sends函數正常工作。 這是我的代碼:

import nltk
from nltk.corpus import PlaintextCorpusReader

corpus_root = './dir_root'
newcorpus = PlaintextCorpusReader(corpus_root, '.*') # Files you want to add

word_list = newcorpus.words('file1.txt')
sentence_list = newcorpus.sents('file1.txt')
paragraph_list = newcorpus.paras('file1.txt')

print(word_list)
print(sentence_list)
print(paragraph_list)

word_list很好。

['__________________________________________________________________', 'Title', ...]

但是,段落列表和句子列表都給出此錯誤:

    Traceback (most recent call last):
  File "corpus.py", line 13, in <module>
    print(sentence_list)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/collections.py", line 225, in __repr__
    for elt in self:
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/corpus/reader/util.py", line 296, in iterate_from
    tokens = self.read_block(self._stream)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/corpus/reader/plaintext.py", line 129, in _read_sent_block
    for sent in self._sent_tokenizer.tokenize(para)])
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 956, in __getattr__
    self.__load()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 948, in __load
    resource = load(self._path)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 808, in load
    opened_resource = _open(resource_url)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 926, in _open
    return find(path_, path + ['']).open()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/data.py", line 648, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource 'tokenizers/punkt/PY3/english.pickle' not found.
  Please use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - '/Users/username/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

我嘗試使用nltk.download()將文件下載到語料庫,但這也不起作用。 另外,由於PlaintextCorpusReader已經做到了,因此它似乎不應該工作。 parassends函數是PlaintextCorpusReader的一部分。 我需要輸入一個特定的fieldid嗎? 或者,是否需要某種正則表達式參數來查找句子或段落? 文檔源代碼似乎沒有說它比單詞 function所需的更多。

您缺少句子標記程序所需的數據文件(“資源”)。 通過在交互式下載器的“模型”下下載“ punkt”資源來解決此問題,或者通過一次運行以下代碼來非交互式地解決該問題:

nltk.download("punkt")

為了避免在探索nltk時反復遇到此類問題,建議立即下載“書”捆綁包。 它包含您可能需要一段時間的所有內容。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM