简体   繁体   中英

SpaCy lemmatization returns 0

today I've tried using SpaCy lemmatization for the first time. I used Polish and English pipelines. I wrote a very simple code:

for token in doc:
    print(token, token.lemma)

I don't understand why, but what I got in return was only "Token, 0". I think I've properly loaded the language pipeline...

token.lemma accesses the hash value of the tokens lemma (so an integer for its internal representation).

token.lemma_ gives you the lemma as string so this is probably what you want.

Check out https://spacy.io/api/lemmatizer#assigned-attributes .

String ID 0 (an empty string) is returned if there is no information. This most likely means that the language model you're using doesn't have a pipeline component that provides lemma information.

For example, the lemmatizer component in en_core_web_lg is what provides token lemmas in that model. Lemmas are generally set using the rule-based Lemmatizer or trained EditTreeLemmatizer components. You can also create your own component that sets lemmas using some other method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM