如何在nltk中使用Regexp Tagger？

Question

If I try this code : 如果我尝试这段代码：

import nltk
pattern = [(r'(March)$','MAR')]
tagger=nltk.RegexpTagger(pattern)
print tagger.tag('He was born in March 1991')

I get an output likr this: 我得到一个输出类似于：

[('H', None), ('e', None), (' ', None), ('w', None), ('a', None), ('s', None), (' ', None), >('b', None), ('o', None), ('r', None), ('n', None), (' ', None), ('i', None), ('n', None), (' ', None), ('M', None), ('a', None), ('r', None), ('c', None), ('h', None), (' ', None), ('1', None), ('9', None), ('9', None), ('1', None)] [（'H'，无），（'e'，无），（''，无），（'w'，无），（'a'，无），（'s'，无），（' '，无），>（'b'，无），（'o'，无），（'r'，无），（'n'，无），（''，无），（'我'，无），（'n'，无），（''，无），（'M'，无），（'a'，无），（'r'，无），（'c'，无），（'h'，无），（''，无），（'1'，无），（'9'，无），（'9'，无），（'1'，无）]

In fact I would like this tagger to recognise 'March' word with 'MAR' tag. 事实上，我希望这个标记器能够识别带有'MAR'标签的'March'字样。

Answer 1

Here try this: 试试这个：

import nltk
pattern = [(r'(March)$','MAR')]
tagger = nltk.RegexpTagger(pattern)
print tagger.tag(nltk.word_tokenize('He was born in March 1991'))

You have to tokenize the words. 你必须对单词进行标记。

This is the output I get: 这是我得到的输出：

[('He', None), ('was', None), ('born', None), ('in', None), ('March', 'MAR'), ('1991', None)]

如何在nltk中使用Regexp Tagger？

问题描述

1 个解决方案

解决方案1
6 已采纳 2013-01-26 03:17:25

如何在nltk中使用Regexp Tagger？

问题描述

1 个解决方案

解决方案1 6 已采纳 2013-01-26 03:17:25

解决方案1
6 已采纳 2013-01-26 03:17:25