Punctuation, stopwords and lemmatization with spacy

Question

I'm trying to apply punctuation removal, stopwords removal and lemmatization to a list of strings

I tried to use lemma_ , is_stop and is_punct

data = ['We will pray and hope for the best', 
    'Though it may not make landfall all week if it follows that track',
    'Heavy rains, capable of producing life-threatening flash floods, are possible']

import spacy
from spacy.lang.en.stop_words import STOP_WORDS

nlp = spacy.load("en")

doc = list(nlp.pipe(data))

data_clean = [[w.lemma_ for w in doc if not w.is_stop and not w.is_punct and not w.like_num] for doc in data]

I have the following error: AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'

(same problem for is_stop and is_punct )

Answer 1

You iterate over the unprocessed list of strings data in the outer-loop, but you need to iterate over doc . Further, your variables have unfavorable names, the following naming should be less confusing:

docs = list(nlp.pipe(data))
data_clean = [[w.lemma_ for w in doc if (not w.is_stop and not w.is_punct and not w.like_num)] for doc in docs]

Punctuation, stopwords and lemmatization with spacy

Question

1 answers

solution1
1 ACCPTED 2019-09-01 16:56:06

Punctuation, stopwords and lemmatization with spacy

Question

1 answers

solution1 1 ACCPTED 2019-09-01 16:56:06

solution1
1 ACCPTED 2019-09-01 16:56:06