简体   繁体   English

使用屏幕名称收集推文并使用Tweepy保存

[英]Collecting tweets using screen names and saving them using Tweepy

I have a list of Twitter screen names and want to collect 3200 tweets per screen name. 我有一个Twitter屏幕名称列表,并且希望每个屏幕名称收集3200条推文。 Below is the codes I have adapted from https://gist.github.com/yanofsky/5436496 以下是我从https://gist.github.com/yanofsky/5436496改编的代码

#initialize a list to hold all the tweepy Tweets
alltweets = []

#screen names
r=['user_a', 'user_b', 'user_c']

#saving tweets
writefile=open("tweets.csv", "wb")
w=csv.writer(writefile)

for i in r:

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = i, count=200)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print "getting tweets before %s" % (oldest)

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = i[0],count=200,max_id=oldest)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print "...%s tweets downloaded so far" % (len(alltweets))

    #write the csv
    for tweet in alltweets:
        w.writerow([i, tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")])

writefile.close()

At the end, the final csv file contains 3200 tweets for user_a, about 6400 tweets for user_b, and 9600 tweets for user_c. 最后,最终的csv文件包含user_a的3200条推文,user_b的大约6400条推文和user_c的9600条推文。 Something is not correct in the above codes. 以上代码中的某些内容不正确。 There should be about 3200 tweets for each user. 每个用户应该有大约3200条推文。 Can anyone point me to what is wrong in the codes? 谁能指出我代码中的错误吗? Thanks. 谢谢。

Because you are using .extend() to add to alltweets , every iteration of the for loop is causing all the next user's tweets to be added to the previous one. 因为您正在使用.extend()添加到alltweets ,所以for循环的每次迭代都会导致将所有下一个用户的tweet添加到上一个。 So you want to clear alltweets at the start of each for loop iteration: 因此,您想在每个for循环迭代的开始时清除alltweets

for i in r:
    alltweets = []
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM