Removing stop words from tokenized text using NLTK: TypeError

Question

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tokenize import PunktSentenceTokenizer
from nltk.stem import WordNetLemmatizer
import re
import time

txt = input()

snt_tkn = sent_tokenize(txt)

wrd_tkn = [word_tokenize(s) for s in snt_tkn]

stp_wrd = set(stopwords.words("english"))

flt_snt = [w for w in wrd_tkn if not w in stp_wrd]

print(flt_snt)

returns the following:

Traceback (most recent call last):
  File "compiler.py", line 19, in 
    flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
  File "compiler.py", line 19, in 
    flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
TypeError: unhashable type: 'list'

I'd like to know, if possible, how to return the tokenized text with stop words removed without editing wrd_tkn .

Answer 1

The error say to you that list is unhasahble. You might try to make it hashable, actually lists are not hasheble because they are mutable, try to convert list to set that is not mutable and that is hashable. It can be done by constructor function

immutable_list = set(some_list)

Answer 2

For future reference, the resolution is the following:

change

flt_snt = [w for w in wrd_tkn if not w in stp_wrd]

to

flt_snt = [[w for w in s if not w in stp_wrd]for s in wrd_tkn]

Removing stop words from tokenized text using NLTK: TypeError

Question

2 answers

solution1
1 2021-07-05 19:35:07

solution2
0 2021-07-06 12:40:31

Removing stop words from tokenized text using NLTK: TypeError

Question

2 answers

solution1 1 2021-07-05 19:35:07

solution2 0 2021-07-06 12:40:31

solution1
1 2021-07-05 19:35:07

solution2
0 2021-07-06 12:40:31