[英]How to Get Full Tweets in CSV Cells Using Tweepy.Cursor
I'm rather new to coding (have only used R for regression modeling) and am now learning Python for a research assistantship.我对编码很陌生(仅使用 R 进行回归建模),现在正在学习 Python 以获取研究助理。 My present task is to use for loops and tweepy.Cursor/API search to search tweets by a list of hashtags and to convert them to a dataframe and store the results in a CSV file.
我目前的任务是使用 for 循环和 tweepy.Cursor/API 搜索通过主题标签列表搜索推文,并将它们转换为数据帧并将结果存储在 CSV 文件中。
I have managed to do so, but the tweets appear truncated in the cells of the CSV file after using this code (mostly inherited from a grad student to help me get started):我已经设法做到了,但是在使用此代码后,推文在 CSV 文件的单元格中出现截断(主要是从研究生那里继承来帮助我入门):
import tweepy as tw
import pandas as pd
import numpy as np
import re
consumer_key = ""
consumer_secret = ""
atoken = ""
asecret = ""
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(atoken, asecret)
api = tw.API(auth, wait_on_rate_limit=True)
hashtag_list = open('hashtag_list.txt', "r")
tweets = []
appended_data = []
tw_all_hashtags = pd.DataFrame(columns = ["text", "hashtag"])
for hashtag in hashtag_list:
hashtag = hashtag.replace('\n','')
try:
for i in tw.Cursor(api.search, q = hashtag, lang = "en", twitter_mode = 'extended').items(25):
tweets.append(i)
one_hashtag_df = pd.DataFrame(vars(tweets[i]) for i in range(len(tweets)))
one_hashtag_df.dropna(subset=['text'], inplace=True)
one_hashtag_df.drop_duplicates(subset='text', keep="last")
one_hashtag_df = one_hashtag_df.drop(one_hashtag_df.index[150:])
one_hashtag_df["hashtag"] = hashtag
tw_all_hashtags = tw_all_hashtags.append(one_hashtag_df[["text", "hashtag"]], ignore_index=True)
tweets = []
except:
print("Temporary error. Please try again later.")
for i in range(len(tw_all_hashtags)):
x = tw_all_hashtags.iloc[i]['text']
tw_all_hashtags.iloc[i]['text'] = ' '.join(
re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", x).split())
tw_all_hashtags['text'] = tw_all_hashtags['text'].str.replace('RT', '')
tw_all_hashtags.reset_index(drop=True).to_csv("tweets_hashtag.csv", index=False)
As you'll see, I tried adding the argument twitter_mode = 'extended' to the tw.Cursor line, but this changed nothing in the final CSV File.正如您将看到的,我尝试将参数 twitter_mode = 'extended' 添加到 tw.Cursor 行,但这在最终的 CSV 文件中没有任何改变。 I receive no errors but still only get cut off tweets when I view them on Excel.
我没有收到任何错误,但当我在 Excel 上查看它们时,仍然只会截断推文。 Any advice for a newbie on how to solve this little problem of mine?
关于如何解决我这个小问题的新手有什么建议吗? Thanks in advance.
提前致谢。 Cheers!
干杯!
请改用tweet_mode = "extended"
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.