简体   繁体   中英

How can I make compounds words singular using an NLP library?

Issue

I'm trying to make compounds words singular from plural using spaCy .

However, I cannot fix an error to transform plural to singular as compounds words.

How can I get the preferred output like the below?

cute dog
two or three word
the christmas day

Develop Environment

Python 3.9.1

Error

    print(str(nlp(word).lemma_))
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'

Code

import spacy
nlp = spacy.load("en_core_web_sm")

words = ["cute dogs", "two or three words", "the christmas days"]

for word in words:
    print(str(nlp(word).lemma_))

Trial

cute
dog
two
or
three
word
the
christmas
day
import spacy
nlp = spacy.load("en_core_web_sm")

words = ["cute dogs", "two or three words", "the christmas days"]

for word in words:
    word = nlp(word)
    for token in word:
        print(str(token.lemma_))

As you've found out, you can't get the lemma of a doc, only of individual words. Multi-word expressions don't have lemmas in English, lemmas are only for individual words. However, conveniently, in English compound words are pluralized just by pluralizing the last word, so you can just make the last word singular. Here's an example:

import spacy

nlp = spacy.load("en_core_web_sm")


def make_compound_singular(text):
    doc = nlp(text)

    if len(doc) == 1:
        return doc[0].lemma_
    else:
        return doc[:-1].text + doc[-2].whitespace_ + doc[-1].lemma_

texts = ["cute dogs", "two or three words", "the christmas days"]
for text in texts:
    print(make_compound_singular(text))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM