使用屏幕名稱收集推文並使用Tweepy保存

Question

我有一個Twitter屏幕名稱列表，並且希望每個屏幕名稱收集3200條推文。 以下是我從https://gist.github.com/yanofsky/5436496改編的代碼

#initialize a list to hold all the tweepy Tweets
alltweets = []

#screen names
r=['user_a', 'user_b', 'user_c']

#saving tweets
writefile=open("tweets.csv", "wb")
w=csv.writer(writefile)

for i in r:

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = i, count=200)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print "getting tweets before %s" % (oldest)

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = i[0],count=200,max_id=oldest)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print "...%s tweets downloaded so far" % (len(alltweets))

    #write the csv
    for tweet in alltweets:
        w.writerow([i, tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")])

writefile.close()

最后，最終的csv文件包含user_a的3200條推文，user_b的大約6400條推文和user_c的9600條推文。 以上代碼中的某些內容不正確。 每個用戶應該有大約3200條推文。 誰能指出我代碼中的錯誤嗎？ 謝謝。

Answer 1

因為您正在使用.extend()添加到alltweets ，所以for循環的每次迭代都會導致將所有下一個用戶的tweet添加到上一個。 因此，您想在每個for循環迭代的開始時清除alltweets ：

for i in r:
    alltweets = []
    ...

使用屏幕名稱收集推文並使用Tweepy保存

問題描述

1 個解決方案

解決方案1
1 已采納 2016-04-22 01:15:07

使用屏幕名稱收集推文並使用Tweepy保存

問題描述

1 個解決方案

解決方案1 1 已采納 2016-04-22 01:15:07

解決方案1
1 已采納 2016-04-22 01:15:07