简体   繁体   English

从用户时间线获取推文时排除转推和回复 - Tweepy

[英]Exclude Retweets and Replies when getting Tweets from user timeline - Tweepy

I use the following code for downloading Tweets from user timeline with Tweepy.我使用以下代码通过 Tweepy 从用户时间线下载推文。 However, this returns Tweets including Retweets and Replies by the user also.但是,这也会返回用户的推文,包括转推和回复。 I want only the Tweets posted in user's own timeline.我只想要发布在用户自己的时间线中的推文。 How can I filter this results?如何过滤此结果?

The reason is I want to collect Tweets posted by cosmetics companies about their products.原因是我想收集化妆品公司发布的关于他们产品的推文。 Tweets in their timeline give me this.他们时间线中的推文给了我这个。 However, Replies and Retweets looks likes regular conversations, do not talk about products.但是,回复和转推看起来像常规对话,不谈论产品。 I want to filter these out.我想过滤掉这些。

import tweepy
import csv
import time

# Twitter API credentials
consumer_key = "xxxxxxx"
consumer_secret = "xxxxx"
access_key = "xxxxxxx"
access_secret = "xxxx"

def get_all_tweets(screen_name):
    # Twitter only allows access to a users most recent 3240 tweets with this method

    # authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    # initialize a list to hold all the tweepy Tweets
    alltweets = []

    # make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name=screen_name, count=200)

    # save most recent tweets
    alltweets.extend(new_tweets)

    # save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    # keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print
        "getting tweets before %s" % (oldest)

        # all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest, include_entities=True)

        # save most recent tweets
        alltweets.extend(new_tweets)

        # update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print
        "...%s tweets downloaded so far" % (len(alltweets))

    user = api.get_user(screen_name)
    followers_count = user.followers_count


    # transform the tweepy tweets into a 2D array that will populate the csv
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8"), 1 if 'media' in tweet.entities else 0,
                  1 if tweet.entities.get('hashtags') else 0, followers_count, tweet.retweet_count, tweet.favorite_count]
                 for tweet in alltweets]


    # write the csv
    with open('tweets.csv', mode='a', encoding='utf-8') as f:
        writer = csv.writer(f)
        #writer.writerow(["id", "created_at", "text", "hasMedia", "hasHashtag", "followers_count", "retweet_count", "favourite_count"])
        writer.writerows(outtweets)

    pass

def main():
    get_all_tweets("@MACcosmetics")


if __name__ == '__main__':
    main()

Unfortunately tweepy does not have this: But instead you can use python-twitter不幸的是 tweepy 没有这个:但是你可以使用python-twitter

this has a method这有一个方法

def GetHomeTimeline(self,
                        count=None,
                        since_id=None,
                        max_id=None,
                        trim_user=False,
                        exclude_replies=False,
                        contributor_details=False,
                        include_entities=True):

and should work well in your case并且应该适用于您的情况

There are some search parameters you can send to twitter to filter the response.您可以将一些搜索参数发送到 Twitter 以过滤响应。

exclude:retweets exclude:replies排除:转推 排除:回复

so basically "SearchParams" + exclude:retweets exclude:replies should work所以基本上“SearchParams”+ exclude:retweets exclude:replys应该有效

There are few more if you want to check it out https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators.html如果您想查看它,还有更多内容https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators.html

Had the same issue and solved it this way right now, I hope it will help someone in the future有同样的问题,现在就这样解决了,我希望它会在未来对某人有所帮助

I did it that way and it worked:我是这样做的,它起作用了:

tweepy.Cursor(api.user_timeline, 
                        screen_name=usuario, 
                        count=None,
                        since_id=None,
                        max_id=None,
                        trim_user=True,
                        exclude_replies=True,
                        contributor_details=False,
                        include_entities=False
                        ).items(200);
tweepy.Cursor(api.user_timeline, screen_name='anyname',include_rts=False)

Option include_rts=False removed retweets .选项include_rts=False已删除转推。

Reference link. 参考链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM