简体   繁体   中英

Collecting tweets using screen names and saving them using Tweepy

I have a list of Twitter screen names and want to collect 3200 tweets per screen name. Below is the codes I have adapted from https://gist.github.com/yanofsky/5436496

#initialize a list to hold all the tweepy Tweets
alltweets = []

#screen names
r=['user_a', 'user_b', 'user_c']

#saving tweets
writefile=open("tweets.csv", "wb")
w=csv.writer(writefile)

for i in r:

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = i, count=200)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print "getting tweets before %s" % (oldest)

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = i[0],count=200,max_id=oldest)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print "...%s tweets downloaded so far" % (len(alltweets))

    #write the csv
    for tweet in alltweets:
        w.writerow([i, tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")])

writefile.close()

At the end, the final csv file contains 3200 tweets for user_a, about 6400 tweets for user_b, and 9600 tweets for user_c. Something is not correct in the above codes. There should be about 3200 tweets for each user. Can anyone point me to what is wrong in the codes? Thanks.

Because you are using .extend() to add to alltweets , every iteration of the for loop is causing all the next user's tweets to be added to the previous one. So you want to clear alltweets at the start of each for loop iteration:

for i in r:
    alltweets = []
    ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM