简体   繁体   中英

I defined a function to Tokenize my text but calling the function generated an error as seen below, kindly asssist

def preprocess_text(text):
    tokenized_document = nltk.tokenize.RegexpTokenizer('[a-zA-Z0-9\']+')
    cleaned_tokens = [word.lower() for word in tokenized_document if word.lower() not in stop_words]
    stemmed_text = [nltk.stem.PorterStemmer().stem(word) for word in cleaned_tokens]
    return stemmed_text

data["Text"] = data["Text"].apply(preprocess_text)

data.head()

Error message:

TypeError: 'RegexpTokenizer' object is not iterable

Your tokenized_document object is an instance of nltk.tokenize.RegexpTokenizer . You are trying to iterate over the values of tokenized_document (in the for word in tokenized_document expression) but the nltk.tokenize.RegexpTokenizer doesn't support that usage. (That's what the 'RegexpTokenizer' object is not iterable message is telling you.)

The source of the problem is that you have not called the tokenize method, and haven't used the text parameter at all.

Fix: call .tokenize(text) :

    tokenized_document = nltk.tokenize.RegexpTokenizer('[a-zA-Z0-9\']+').tokenize(text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM