简体   繁体   English

使用Tweepy提取推文一周

[英]Extracting Tweets for a week using Tweepy

I want to store tweets in CSV, I used tweepy and I managed to store it in CVS but it only extracts data for one day. 我想将推文存储为CSV,我使用了tweepy,并且设法将其存储在CVS中,但它仅提取一天的数据。 I want to extract and store data for a week without needing to extract it every day. 我想提取并存储一周的数据,而无需每天提取。

This is what I have done: 这是我所做的:

def tweets_to_data_frame(public_tweets):
    df = pd.DataFrame(data=[tweet.text for tweet in public_tweets], columns=['Tweets'])
    df['len'] = np.array([len(tweet.text) for tweet in public_tweets])
    df['date'] = np.array([tweet.created_at for tweet in public_tweets])
    df['retweets'] = np.array([tweet.retweet_count for tweet in public_tweets])
    df['lang'] = np.array([tweet.lang for tweet in public_tweets])
    return df

public_tweet= api.search('donald trump')
df = tweets_to_data_frame(public_tweet)
df.to_csv('donaldtrump.csv')
df.head(15)
    Tweets  len date    retweets    lang
0   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:23 67  en
1   RT @errollouis: "If the House ever gets his re...   140 2019-04-09 11:08:23 7927    en
2   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:22 73  en
3   RT @Newsweek: Trump claimed he wouldn't have t...   140 2019-04-09 11:08:21 7   en
4   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:20 67  en
5   The real reason Donald Trump just fired the he...   112 2019-04-09 11:08:19 0   en
6   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:19 73  en
7   RT @BobbyEberle13: Ilhan Omar is now praying f...   140 2019-04-09 11:08:18 457 en
8   The guy met the queen last time out and lots o...   140 2019-04-09 11:08:17 0   en
9   RT @PalmerReport: Donald Trump’s deconstructio...   135 2019-04-09 11:08:17 107 en
10  RT @ByronYork: Donald Trump has been paying ta...   139 2019-04-09 11:08:16 1232    en
11  RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:16 67  en
12  RT @SayWhenLA: 🚨 YUGE !!\n\nPresident Donald J...  140 2019-04-09 11:08:15 1316    en
13  "As long as you're going to be thinking anyway...   100 2019-04-09 11:08:15 0   en
14  RT @TheLastRefuge2: Diana West Discusses The R...   140 2019-04-09 11:08:15 113 en

What I want is the data for one week, 我想要的是一个星期的数据,

my idea is: 我的想法是:

def tweets_to_data_frame1(public_tweets):
    for tweets in tweepy.Cursor(api.search,q = (public_tweets),count=100,
                           since = "2019-04-04",
                           until = "2019-04-07").items():
        df = pd.DataFrame(data=[tweets.text for tweet in tweets], columns=['Tweets'])
        df['len'] = np.array([len(tweets.text) for tweet in tweets])
        df['date'] = np.array([tweets.created_at for tweet in tweets])
        df['retweets'] = np.array([tweets.retweet_count for tweet in tweets])
        df['lang'] = np.array([tweets.lang for tweet in tweets])

        return df

df1 = tweets_to_data_frame1('donald trump')

error: 错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-96745c16c99c> in <module>
----> 1 df1 = tweets_to_data_frame1('donald trump')

<ipython-input-23-e5866a4adb3f> in tweets_to_data_frame1(public_tweets)
      3                            since = "2019-04-04",
      4                            until = "2019-04-07").items():
----> 5         df = pd.DataFrame(data=[tweets.text for tweet in tweets], columns=['Tweets'])
      6 
      7         #df['id'] = np.array([tweet.id for tweet in tweets])

TypeError: 'Status' object is not iterable

expected results: 预期成绩:

Tweets  len date    retweets    lang
0   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:23 67  en
1   RT @errollouis: "If the House ever gets his re...   140 2019-04-09 11:08:23 7927    en
2   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:22 73  en
3   RT @Newsweek: Trump claimed he wouldn't have t...   140 2019-04-09 11:08:21 7   en
4   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:20 67  en
5   The real reason Donald Trump just fired the he...   112 2019-04-09 11:08:19 0   en
6   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:19 73  en
7   RT @BobbyEberle13: Ilhan Omar is now praying f...   140 2019-04-09 11:08:18 457 en
8   The guy met the queen last time out and lots o...   140 2019-04-09 11:08:17 0   en
9   RT @PalmerReport: Donald Trump’s deconstructio...   135 2019-04-09 11:08:17 107 en
10  RT @ByronYork: Donald Trump has been paying ta...   139 2019-04-09 11:08:16 1232    en
11  RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:16 67  en
12  RT @SayWhenLA: 🚨 YUGE !!\n\nPresident Donald J...  140 2019-04-09 11:08:15 1316    en
13  "As long as you're going to be thinking anyway...   100 2019-04-09 11:08:15 0   en
14  RT @TheLastRefuge2: Diana West Discusses The R...   140 2019-04-09 11:08:15 113 en

but for one week 但是一个星期

So I guess the issue is here: 所以我想问题出在这里:

for tweets in tweepy.Cursor(api.search,q = (public_tweets),count=100,since = "2019-04-04",until = "2019-04-07").items():

tweepy.Cursor(...).items() is a list. tweepy.Cursor(...).items()是一个列表。 So each value of tweets variable is a single tweet. 因此, tweets变量的每个值都是单个tweet。 And then you're trying to using list comprehension, so you are trying to iterate over that single tweet. 然后,您尝试使用列表推导,因此您尝试遍历该单个推文。 That is exactly what error message told you. 这正是错误消息告诉您的内容。

What you could do instead would be something like: 相反,您可以做的是:

tweets = tweepy.Cursor(...).items()
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

BTW also I would rename public_tweets argument of def tweets_to_data_frame1(public_tweets): 顺便说一句,我也将重命名def tweets_to_data_frame1(public_tweets): public_tweets参数def tweets_to_data_frame1(public_tweets):

public_tweets argument in this case is just a search query string so the name is misleading 在这种情况下, public_tweets参数只是一个搜索查询字符串,因此名称具有误导性

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM