[英]Backoff Tagger in nltk
I am new to python coding.I want to use the UnigramTagger along with a backoff(which is in my case a RegexpTagger) and I have been struggling hard to figure out what the below error is. 我是python编码的新手。我想使用UnigramTagger和退避(在我的情况下是一个RegexpTagger),我一直在努力弄清楚下面的错误是什么。 Appreciate any help on this.
感谢任何帮助。
>>> train_sents = (['@Sakshi', 'Hi', 'I', 'am', 'meeting', 'my', 'friend', 'today'])
>>> from tag_util import patterns
>>> from nltk.tag import RegexpTagger
>>> re_tagger = RegexpTagger(patterns)
>>> from nltk.tag import UnigramTagger
>>> from tag_util import backoff_tagger
>>> tagger = backoff_tagger(train_sents, UnigramTagger, backoff=re_tagger)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
tagger = backoff_tagger(train_sents, UnigramTagger, backoff=re_tagger)
File "tag_util.py", line 12, in backoff_tagger
for cls in tagger_classes:
TypeError: 'YAMLObjectMetaclass' object is not iterable
This is the code I have in tag_util for patterns and backoff_tagger 这是我在tag_util中用于模式和backoff_tagger的代码
import re
patterns = [
(r'^@\w+', 'NNP'),
(r'^\d+$', 'CD'),
(r'.*ing$', 'VBG'), # gerunds, i.e. wondering
(r'.*ment$', 'NN'),
(r'.*ful$', 'JJ'), # i.e. wonderful
(r'.*', 'NN')
]
def backoff_tagger(train_sents, tagger_classes, backoff=None):
for cls in tagger_classes:
backoff = cls(train_sents, backoff=backoff)
return backoff
You only need to change a few things for this to work. 你只需要改变一些东西就可以了。
The error you are getting is because you cannot iterate over the class UnigramTagger
. 您得到的错误是因为您无法迭代
UnigramTagger
类。 I'm not sure if you had something else in mind but just remove the for
loop. 我不确定你是否还有别的东西,但只是删除了
for
循环。 Also, you need to pass UnigramTagger
a list
of tagged sentences represented as list
s of (word, tag) tuple
s - not just a list of words. 此外,您还需要通过
UnigramTagger
一个list
表示为标记的句子的list
中(字标记)■ tuple
的S -不只是单词的列表。 Otherwise, it doesn't know how to train. 否则,它不知道如何训练。 Part of this might look like:
部分原因可能如下:
[[('@Sakshi', 'NN'), ('Hi', 'NN'),...],...[('Another', 'NN'), ('sentence', 'NN')]]
Notice here that each sentence is itself a list
. 请注意,每个句子本身就是一个
list
。 Also, you can use a tagged corpus from NTLK for this (which I recommend). 此外,您可以使用NTLK的标记语料库(我推荐)。
Edit: 编辑:
After reading your post it seems to me that you're both confused about what input/output to expect from certain functions and lacking an understanding of training in the NLP sense. 在阅读你的帖子之后,我觉得你们对于某些功能的输入/输出感到困惑,并且对NLP意义上的训练缺乏了解。 I think you would greatly benefit from reading the NLTK book, starting at the beginning .
我认为从一开始就阅读NLTK书将会大大受益。
I'm glad to show you how to fix this but I don't think you'll have a complete understanding of the underlying mechanisms without some more research. 我很高兴向您展示如何解决这个问题,但我认为如果没有更多的研究,您将无法完全了解基础机制。
tag_util.py (based on your code) tag_util.py(基于您的代码)
from nltk.tag import RegexpTagger, UnigramTagger
from nltk.corpus import brown
patterns = [
(r'^@\w+', 'NNP'),
(r'^\d+$', 'CD'),
(r'.*ing$', 'VBG'),
(r'.*ment$', 'NN'),
(r'.*ful$', 'JJ'),
(r'.*', 'NN')
]
re_tagger = RegexpTagger(patterns)
tagger = UnigramTagger(brown.tagged_sents(), backoff=re_tagger) # train tagger
In the Python interpreter 在Python解释器中
>>> import tag_util
>>> tag_util.brown.tagged_sents()[:2]
[[('The', 'AT'), ('Fulton', 'NP-TL'), ('County', 'NN-TL'), ('Grand', 'JJ-TL'), ('Jury', 'NN-TL'), ('said', 'VBD'), ('Friday', 'NR'), ('an', 'AT'), ('investigation', 'NN'), ('of', 'IN'), ("Atlanta's", 'NP$'), ('recent', 'JJ'), ('primary', 'NN'), ('election', 'NN'), ('produced', 'VBD'), ('``', '``'), ('no', 'AT'), ('evidence', 'NN'), ("''", "''"), ('that', 'CS'), ('any', 'DTI'), ('irregularities', 'NNS'), ('took', 'VBD'), ('place', 'NN'), ('.', '.')], [('The', 'AT'), ('jury', 'NN'), ('further', 'RBR'), ('said', 'VBD'), ('in', 'IN'), ('term-end', 'NN'), ('presentments', 'NNS'), ('that', 'CS'), ('the', 'AT'), ('City', 'NN-TL'), ('Executive', 'JJ-TL'), ('Committee', 'NN-TL'), (',', ','), ('which', 'WDT'), ('had', 'HVD'), ('over-all', 'JJ'), ('charge', 'NN'), ('of', 'IN'), ('the', 'AT'), ('election', 'NN'), (',', ','), ('``', '``'), ('deserves', 'VBZ'), ('the', 'AT'), ('praise', 'NN'), ('and', 'CC'), ('thanks', 'NNS'), ('of', 'IN'), ('the', 'AT'), ('City', 'NN-TL'), ('of', 'IN-TL'), ('Atlanta', 'NP-TL'), ("''", "''"), ('for', 'IN'), ('the', 'AT'), ('manner', 'NN'), ('in', 'IN'), ('which', 'WDT'), ('the', 'AT'), ('election', 'NN'), ('was', 'BEDZ'), ('conducted', 'VBN'), ('.', '.')]]
Notice the output here. 注意这里的输出。 I am getting the first two sentences from the Brown corpus of tagged sentences.
我从标记句子的布朗语料库中得到前两句话。 This is the kind of data you need to pass to a tagger as input (like the UnigramTagger) to train it.
这是您需要传递给标记器作为输入(如UnigramTagger)来训练它的数据。 Now lets use the tagger we trained in
tag_util.py
. 现在让我们使用我们在
tag_util.py
训练的标记器。
Back to the Python interpreter 回到Python解释器
>>> tag_util.tagger.tag(['I', 'just', 'drank', 'some', 'coffee', '.'])
[('I', 'PPSS'), ('just', 'RB'), ('drank', 'VBD'), ('some', 'DTI'), ('coffee', 'NN'), ('.', '.')]
And there you have it, POS tagged words of a sentence using your approach. 而且你有它,POS用你的方法标记一个句子的单词。
If you are using backoff_tagger
that I am thinking. 如果您正在使用我正在考虑的
backoff_tagger
。 UnigramTagger
should be an item of a list as below: UnigramTagger
应该是列表中的项目,如下所示:
tagger = backoff_tagger(train_sents, [UnigramTagger], backoff=re_tagger)
I hope it helps. 我希望它有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.