简体   繁体   English

类型错误:字符串索引必须是整数 - 清理我的文本

[英]TypeError: string indices must be integers - Cleaning my text

Trying to clean the tweets with this function:尝试使用此功能清理推文:

class PreProcessTweets:
    def __init__(self):
        self._stopwords = set(stopwords.words('english') + list(punctuation) + ['AT_USER','URL'])

    def processTweets(self, list_of_tweets):
        processedTweets=[]
        for tweet in list_of_tweets:
            processedTweets.append((self._processTweet(tweet["Text"])))
        return processedTweets

    def _processTweet(self, tweet):
        tweet = tweet.lower() # convert text to lower-case
        tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', 'URL', tweet) # remove URLs
        tweet = re.sub('@[^\s]+', 'AT_USER', tweet) # remove usernames
        tweet = re.sub(r'#([^\s]+)', r'\1', tweet) # remove the # in #hashtag
        tweet = word_tokenize(tweet) # remove repeated characters (helloooooooo into hello)
        return [word for word in tweet if word not in self._stopwords]

And when I want to use it:当我想使用它时:

preprocessedTestSet = tweetProcessor.processTweets(tweet)

I received this output我收到了这个输出

TypeError: string indices must be integers类型错误:字符串索引必须是整数

What is wrong?怎么了? How can I fix it?我该如何解决?

Assuming tweet is a string.假设tweet是一个字符串。 You should pass it as is.你应该按原样传递它。 You've used tweet["Text"] which is an illegal operation on a string since the index must be an integer.您使用了tweet["Text"] ,这是对字符串的非法操作,因为索引必须是整数。

def processTweets(self, list_of_tweets):
    processedTweets=[]
    for tweet in list_of_tweets:
        processedTweets.append(self._processTweet(tweet))
    return processedTweets

Or more Pythonic:或者更多 Pythonic:

def processTweets(self, list_of_tweets):
    return [self._processTweet(tweet) for tweet in list_of_tweets]

Notes:笔记:

You probably forgot to use raw strings ( r"" ) in some of your regex.您可能忘记在某些正则表达式中使用原始字符串 ( r"" )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM