简体   繁体   English

自定义词形修饰词并附加到WordNetLemmatizer

[英]self-define lemmatized words and append to WordNetLemmatizer

I would like to append some exceptions for lemmatization results. 我想为词素化结果添加一些例外。 For example, when I test out wnl.lemmatize('cookies') , the result I got is cooky instead of cookie . 例如,当我测试wnl.lemmatize('cookies') ,得到的结果是cooky而不是cookie How can I update the lemmatization result to cookie ? 我如何将去词化结果更新为cookie

import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer 
wnl = WordNetLemmatizer()

def text_cleaning(text):
  text = text.lower()
  tok_list = [wnl.lemmatize(w,tag[0].lower()) if tag[0].lower() in ['a','n','v'] else wnl.lemmatize(w) for w,tag in pos_tag(word_tokenize(text))]
return ' '.join(tok_list)

Looking through the implementation found here you can probably do something like 查看这里找到的实现您可能可以执行以下操作

class WNWrapper(WordNetLemmatizer):
    def __init__(self, custom_transforms):
        self.custom_transforms = custom_transforms

    def lemmatize(self, word):
        if word in self.custom_transforms:
            return self.custom_transforms[word]
        super().lemmatize(word)

but this only works when 但这仅在以下情况下有效

1) you know which words you want to change/ not change 1)您知道要更改/不更改的单词

2) it's a small number. 2)这是一个很小的数字。 This obviously doesn't scale 这显然无法扩展

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM