[英]TypeError: string indices must be integers - Cleaning my text
Trying to clean the tweets with this function:尝试使用此功能清理推文:
class PreProcessTweets:
def __init__(self):
self._stopwords = set(stopwords.words('english') + list(punctuation) + ['AT_USER','URL'])
def processTweets(self, list_of_tweets):
processedTweets=[]
for tweet in list_of_tweets:
processedTweets.append((self._processTweet(tweet["Text"])))
return processedTweets
def _processTweet(self, tweet):
tweet = tweet.lower() # convert text to lower-case
tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', 'URL', tweet) # remove URLs
tweet = re.sub('@[^\s]+', 'AT_USER', tweet) # remove usernames
tweet = re.sub(r'#([^\s]+)', r'\1', tweet) # remove the # in #hashtag
tweet = word_tokenize(tweet) # remove repeated characters (helloooooooo into hello)
return [word for word in tweet if word not in self._stopwords]
And when I want to use it:当我想使用它时:
preprocessedTestSet = tweetProcessor.processTweets(tweet)
I received this output我收到了这个输出
TypeError: string indices must be integers
类型错误:字符串索引必须是整数
What is wrong?怎么了? How can I fix it?
我该如何解决?
Assuming tweet
is a string.假设
tweet
是一个字符串。 You should pass it as is.你应该按原样传递它。 You've used
tweet["Text"]
which is an illegal operation on a string since the index must be an integer.您使用了
tweet["Text"]
,这是对字符串的非法操作,因为索引必须是整数。
def processTweets(self, list_of_tweets):
processedTweets=[]
for tweet in list_of_tweets:
processedTweets.append(self._processTweet(tweet))
return processedTweets
Or more Pythonic:或者更多 Pythonic:
def processTweets(self, list_of_tweets):
return [self._processTweet(tweet) for tweet in list_of_tweets]
Notes:笔记:
You probably forgot to use raw strings ( r""
) in some of your regex.您可能忘记在某些正则表达式中使用原始字符串 (
r""
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.