简体   繁体   English

使用nltk pos标记器时出现zip文件错误

[英]Bad zip file error while using nltk pos tagger

I'm trying to use the NLTK POS-tagger, but am getting a "zipfile.BadZipfile: File is not a zip file" error. 我正在尝试使用NLTK POS标记,但收到“ zipfile.BadZipfile:文件不是zip文件”错误。

The error comes from this code: 错误来自以下代码:

import nltk
sentence = "I love python"
tokens = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)
print nltk.ne_chunk(pos_tags, binary=True)

I found this question related to my problem. 我发现这个问题与我的问题有关。 Unfortunately I can't download the entire corpus since I'm working on a server and have a lot of memory restrictions. 不幸的是,由于我在服务器上工作,并且有很多内存限制,因此无法下载整个语料库。 Can someone point me to the particular file I need so I can download just that one instead of the entire corpora? 有人可以指出我所需的特定文件,以便我可以下载该文件而不是整个资料集吗?

(I'm using Python 2.7.6) (我正在使用Python 2.7.6)

Try these: 试试这些:

nltk.download("maxent_treebank_pos_tagger")
nltk.download("maxent_ne_chunker")
nltk.download("punkt")

The first two are for POS tagging and named entities, respectively. 前两个分别用于POS标签和命名实体。 The third you're not using in your code sample, but you'll need it for nltk.sent_tokenize() , which breaks up plain text into sentences. 您在代码示例中未使用的第三个,但是nltk.sent_tokenize()需要它, nltk.sent_tokenize()纯文本分解为句子。 Since you'll be working with POS tags I'd also download these (they're tiny): 由于您将使用POS标签,因此我还将下载这些标签(它们很小):

nltk.download(["tagsets", "universal_tagset"])

If you do have a bit of space, downloading the entire "book" collection will give you everything you need to explore the NLTK. 如果您确实有足够的空间,那么下载整个“书”集将为您提供探索NLTK所需的一切。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM