简体   繁体   English

如何使用 NLP 库使复合词成为单数?

[英]How can I make compounds words singular using an NLP library?

Issue问题

I'm trying to make compounds words singular from plural using spaCy .我正在尝试使用spaCy将复合词从复数变为单数。

However, I cannot fix an error to transform plural to singular as compounds words.但是,我无法修复将复数转换为单数作为复合词的错误。

How can I get the preferred output like the below?如何获得如下所示的首选 output?

cute dog
two or three word
the christmas day

Develop Environment开发环境

Python 3.9.1 Python 3.9.1

Error错误

    print(str(nlp(word).lemma_))
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'

Code代码

import spacy
nlp = spacy.load("en_core_web_sm")

words = ["cute dogs", "two or three words", "the christmas days"]

for word in words:
    print(str(nlp(word).lemma_))

Trial审判

cute
dog
two
or
three
word
the
christmas
day
import spacy
nlp = spacy.load("en_core_web_sm")

words = ["cute dogs", "two or three words", "the christmas days"]

for word in words:
    word = nlp(word)
    for token in word:
        print(str(token.lemma_))

As you've found out, you can't get the lemma of a doc, only of individual words.正如您所发现的,您无法获得文档的引理,只能获得单个单词的引理。 Multi-word expressions don't have lemmas in English, lemmas are only for individual words.多词表达在英语中没有引理,引理仅适用于单个单词。 However, conveniently, in English compound words are pluralized just by pluralizing the last word, so you can just make the last word singular.但是,方便的是,在英语中,复合词只需将最后一个单词复数即可,因此您可以将最后一个单词设为单数。 Here's an example:这是一个例子:

import spacy

nlp = spacy.load("en_core_web_sm")


def make_compound_singular(text):
    doc = nlp(text)

    if len(doc) == 1:
        return doc[0].lemma_
    else:
        return doc[:-1].text + doc[-2].whitespace_ + doc[-1].lemma_

texts = ["cute dogs", "two or three words", "the christmas days"]
for text in texts:
    print(make_compound_singular(text))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM