I defined a function to Tokenize my text but calling the function generated an error as seen below, kindly asssist

Question

def preprocess_text(text):
    tokenized_document = nltk.tokenize.RegexpTokenizer('[a-zA-Z0-9\']+')
    cleaned_tokens = [word.lower() for word in tokenized_document if word.lower() not in stop_words]
    stemmed_text = [nltk.stem.PorterStemmer().stem(word) for word in cleaned_tokens]
    return stemmed_text

data["Text"] = data["Text"].apply(preprocess_text)

data.head()

Error message:

TypeError: 'RegexpTokenizer' object is not iterable

Answer 1

Your tokenized_document object is an instance of nltk.tokenize.RegexpTokenizer . You are trying to iterate over the values of tokenized_document (in the for word in tokenized_document expression) but the nltk.tokenize.RegexpTokenizer doesn't support that usage. (That's what the 'RegexpTokenizer' object is not iterable message is telling you.)

Answer 2

The source of the problem is that you have not called the tokenize method, and haven't used the text parameter at all.

Fix: call .tokenize(text) :

    tokenized_document = nltk.tokenize.RegexpTokenizer('[a-zA-Z0-9\']+').tokenize(text)

I defined a function to Tokenize my text but calling the function generated an error as seen below, kindly asssist

Question

2 answers

solution1
0 2022-12-25 05:28:00

solution2
0 2022-12-25 13:16:18

I defined a function to Tokenize my text but calling the function generated an error as seen below, kindly asssist

Question

2 answers

solution1 0 2022-12-25 05:28:00

solution2 0 2022-12-25 13:16:18

solution1
0 2022-12-25 05:28:00

solution2
0 2022-12-25 13:16:18