简体   繁体   English

使用tweepy进行文本挖掘

[英]Text Mining using tweepy

I have collected the tweets using tweepy api and i have tokenized them and removed the stopwords but when i load them using json it throws the following error 我已经使用tweepy api收集了tweet,并标记了它们并删除了停用词,但是当我使用json加载它们时,它会引发以下错误

"File "C:\Python27\Projects\kik.py", line 26, in <module>
    tweet = json.loads(tokens)
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer" 

Please help me out. 请帮帮我。

tweets_data_path = 'c:\\Python27\\Projects\\newstweets.txt'
stopset = set(stopwords.words('english'))

tweets_data = []
tweets_file = open(tweets_data_path, "r")
text = tweets_file.read()
tokens=word_tokenize(str(text))
tokens = [w for w in tokens if not w in stopset]
tweet = json.loads(tokens)
tweets_data.append(tweet)

json.loads expects a string, you are trying to load a list. json.loads需要一个字符串,您正在尝试加载列表。

Instead of: 代替:

tokens = [w for w in tokens if not w in stopset]

Try: 尝试:

tokens = str([w for w in tokens if not w in stopset])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM