简体   繁体   English

如何使用 Tweepy.Cursor 在 CSV 单元格中获取完整的推文

[英]How to Get Full Tweets in CSV Cells Using Tweepy.Cursor

I'm rather new to coding (have only used R for regression modeling) and am now learning Python for a research assistantship.我对编码很陌生(仅使用 R 进行回归建模),现在正在学习 Python 以获取研究助理。 My present task is to use for loops and tweepy.Cursor/API search to search tweets by a list of hashtags and to convert them to a dataframe and store the results in a CSV file.我目前的任务是使用 for 循环和 tweepy.Cursor/API 搜索通过主题标签列表搜索推文,并将它们转换为数据帧并将结果存储在 CSV 文件中。

I have managed to do so, but the tweets appear truncated in the cells of the CSV file after using this code (mostly inherited from a grad student to help me get started):我已经设法做到了,但是在使用此代码后,推文在 CSV 文件的单元格中出现截断(主要是从研究生那里继承来帮助我入门):

import tweepy as tw 
import pandas as pd
import numpy as np
import re

consumer_key = ""
consumer_secret = ""
atoken = ""
asecret = ""

auth = tw.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(atoken, asecret)
api = tw.API(auth, wait_on_rate_limit=True)

hashtag_list = open('hashtag_list.txt', "r")

tweets = []
appended_data = []
tw_all_hashtags = pd.DataFrame(columns = ["text", "hashtag"]) 

for hashtag in hashtag_list:
    hashtag = hashtag.replace('\n','') 
    try:
        for i in tw.Cursor(api.search, q = hashtag, lang = "en", twitter_mode = 'extended').items(25): 
            tweets.append(i)

        one_hashtag_df = pd.DataFrame(vars(tweets[i]) for i in range(len(tweets)))  
        one_hashtag_df.dropna(subset=['text'], inplace=True)
        one_hashtag_df.drop_duplicates(subset='text', keep="last")
        one_hashtag_df = one_hashtag_df.drop(one_hashtag_df.index[150:])
        one_hashtag_df["hashtag"] = hashtag
        tw_all_hashtags = tw_all_hashtags.append(one_hashtag_df[["text", "hashtag"]], ignore_index=True)
        tweets = [] 
    except:
      print("Temporary error. Please try again later.") 

      
for i in range(len(tw_all_hashtags)):
    x = tw_all_hashtags.iloc[i]['text']
    tw_all_hashtags.iloc[i]['text'] = ' '.join(
        re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", x).split()) 
tw_all_hashtags['text'] = tw_all_hashtags['text'].str.replace('RT', '')
tw_all_hashtags.reset_index(drop=True).to_csv("tweets_hashtag.csv", index=False)

As you'll see, I tried adding the argument twitter_mode = 'extended' to the tw.Cursor line, but this changed nothing in the final CSV File.正如您将看到的,我尝试将参数 twitter_mode = 'extended' 添加到 tw.Cursor 行,但这在最终的 CSV 文件中没有任何改变。 I receive no errors but still only get cut off tweets when I view them on Excel.我没有收到任何错误,但当我在 Excel 上查看它们时,仍然只会截断推文。 Any advice for a newbie on how to solve this little problem of mine?关于如何解决我这个小问题的新手有什么建议吗? Thanks in advance.提前致谢。 Cheers!干杯!

请改用tweet_mode = "extended"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM