SpaCy lemmatization returns 0

Question

today I've tried using SpaCy lemmatization for the first time. I used Polish and English pipelines. I wrote a very simple code:

for token in doc:
    print(token, token.lemma)

I don't understand why, but what I got in return was only "Token, 0". I think I've properly loaded the language pipeline...

Answer 1

token.lemma accesses the hash value of the tokens lemma (so an integer for its internal representation).

token.lemma_ gives you the lemma as string so this is probably what you want.

Check out https://spacy.io/api/lemmatizer#assigned-attributes .

Answer 2

String ID 0 (an empty string) is returned if there is no information. This most likely means that the language model you're using doesn't have a pipeline component that provides lemma information.

For example, the lemmatizer component in en_core_web_lg is what provides token lemmas in that model. Lemmas are generally set using the rule-based Lemmatizer or trained EditTreeLemmatizer components. You can also create your own component that sets lemmas using some other method.

SpaCy lemmatization returns 0

Question

2 answers

solution1
0 2022-07-18 20:57:06

solution2
0 2022-07-19 17:04:44

SpaCy lemmatization returns 0

Question

2 answers

solution1 0 2022-07-18 20:57:06

solution2 0 2022-07-19 17:04:44

solution1
0 2022-07-18 20:57:06

solution2
0 2022-07-19 17:04:44