使用nltk pos标记器时出现zip文件错误

Question

I'm trying to use the NLTK POS-tagger, but am getting a "zipfile.BadZipfile: File is not a zip file" error. 我正在尝试使用NLTK POS标记，但收到“ zipfile.BadZipfile：文件不是zip文件”错误。

The error comes from this code: 错误来自以下代码：

import nltk
sentence = "I love python"
tokens = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)
print nltk.ne_chunk(pos_tags, binary=True)

I found this question related to my problem. 我发现这个问题与我的问题有关。 Unfortunately I can't download the entire corpus since I'm working on a server and have a lot of memory restrictions. 不幸的是，由于我在服务器上工作，并且有很多内存限制，因此无法下载整个语料库。 Can someone point me to the particular file I need so I can download just that one instead of the entire corpora? 有人可以指出我所需的特定文件，以便我可以下载该文件而不是整个资料集吗？

(I'm using Python 2.7.6) （我正在使用Python 2.7.6）

Answer 1

Try these: 试试这些：

nltk.download("maxent_treebank_pos_tagger")
nltk.download("maxent_ne_chunker")
nltk.download("punkt")

The first two are for POS tagging and named entities, respectively. 前两个分别用于POS标签和命名实体。 The third you're not using in your code sample, but you'll need it for nltk.sent_tokenize() , which breaks up plain text into sentences. 您在代码示例中未使用的第三个，但是nltk.sent_tokenize()需要它， nltk.sent_tokenize()纯文本分解为句子。 Since you'll be working with POS tags I'd also download these (they're tiny): 由于您将使用POS标签，因此我还将下载这些标签（它们很小）：

nltk.download(["tagsets", "universal_tagset"])

If you do have a bit of space, downloading the entire "book" collection will give you everything you need to explore the NLTK. 如果您确实有足够的空间，那么下载整个“书”集将为您提供探索NLTK所需的一切。

使用nltk pos标记器时出现zip文件错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-05-03 00:19:08

使用nltk pos标记器时出现zip文件错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-05-03 00:19:08

解决方案1
1 已采纳 2015-05-03 00:19:08