简体   繁体   English

使用python textblob库标记器时出错

[英]Error when using python textblob library tagger

I had the textblob library working fine for a while, but decided to install (using easy_install) an additional library ( page here ) claiming faster and more accurate tagging. 我让textblob库工作了一段时间,但决定安装(使用easy_install)一个附加的库( 此处为页面 ),声称可以更快更准确地进行标记。

I couldn't get it working so I uninstalled it, but it seems to have messed with the tagging function in TextBlob. 我无法使其正常工作,因此我将其卸载了,但似乎与TextBlob中的标记功能一团糟。 I've uninstalled and reinstalled both nltk and TextBlob numerous times with both pip and easy_install, and made sure they're up to date. 我已经使用pip和easy_install多次卸载并重新安装了nltk和TextBlob,并确保它们是最新的。

Here is an example of a simple script which generates the error: 这是生成错误的简单脚本的示例:

from textblob import TextBlob

blob = TextBlob("This is a sentence")
print repr(blob.tags)

and the error printed: 并显示错误:

    Traceback (most recent call last):
  File "tesst.py", line 5, in <module>
    print repr(blob.tags)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\blob.py", line 445, in pos_tags
    for word, t in self.pos_tagger.tag(self.raw)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 35, in decorated
    return func(*args, **kwargs)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\en\taggers.py", line 34, in tag
    tagged = nltk.tag.pos_tag(text)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 924, in _open
    return urlopen(resource_url)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 454, in _open
    'unknown_open', req)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 1265, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>

You can see that the error actually mentions the perceptron tagger. 您可以看到该错误实际上提到了感知器标记器。 Is there any way to more thoroughly remove any references there may be to the alternate tagger? 有什么方法可以更彻底地删除备用标记器的引用吗?

Also note that only the "tags" function has been affected. 另请注意,仅“标签”功能受到了影响。

This seems to be a problem with nltk version 3.2. 这似乎是nltk 3.2版的问题。 Until it's fixed in the release, you can use this hack: NLTK v3.2: Unable to nltk.pos_tag() 在此发行版中进行修复之前,您可以使用以下黑客: NLTK v3.2:无法使用nltk.pos_tag()

I found out why I was having trouble with the ap tagger. 我发现了为什么我在使用ap tagger时遇到麻烦。 My issue is solved here. 我的问题在这里解决了。 More specifically, by the comment "Another option is to install nltk and then change "from textblob.packages import nltk" to "import nltk" [in the taggers.py] file." 更具体地说,通过注释“另一种选择是安装nltk,然后在taggers.py文件中将“从textblob.packages import nltk”更改为“ import nltk”。

(Note that this doesn't correspond to the error message above: that error was coming up without aptagger installed. I was getting another error with it installed, and this is a solution for that.) (注意,这并不对应于上述错误信息:该错误被上来而不aptagger安装它安装我得到另一个错误,而这是应该是一个解决方案。)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM