简体   繁体   中英

scrape tweets from a list of hashtags using snscrape

I am trying trying to scrape tweets using snscrape. I am able to scrape by location and with tweets that contain specific phrase. My question is how can I scrape tweets that might have tweets from a list I create. For example let s say I want to look for hashtags

hashtags = ('data analytics', 'data science', 'machine learning')

I want to search in an OR sense (the hashtags can be either 1 of those in the list or a combination of those hashtags or all of them)

in order to scrape tweets by hashtags you have to search them as #hashtag. In your example you have to do something like #dataanalytics #datascience. If you want an Or preposition between them in your research just add it (#dataanalytics OR #datascience). I add here a function that I made in order to scrapes tweet and return a df with some features of my interest. n_tweet is used to put an upper bound at the number of tweet that you want. At the end of the function I add also a possible call.

def tweet_scraper(query, n_tweet):

attributes_container = []
max_tweet = n_tweet

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):

    if i>max_tweet:
        break
        
    attributes_container.append([tweet.user.username,
                                 tweet.user.verified,
                                 tweet.user.created,
                                 tweet.user.followersCount,
                                 tweet.user.friendsCount,
                                 tweet.retweetCount,
                                 tweet.lang,
                                 tweet.date,
                                 tweet.likeCount,
                                 tweet.sourceLabel,
                                 tweet.id,
                                 tweet.content,
                                 tweet.hashtags,
                                 tweet.conversationId,
                                 tweet.inReplyToUser,
                                 tweet.coordinates,
                                 tweet.place])
    
return pd.DataFrame(attributes_container, columns=["User",
                                                   "verified",
                                                   "Date_Created",
                                                   "Follows_Count",
                                                   "Friends_Count",
                                                   "Retweet_Count",
                                                   "Language",
                                                   "Date_Tweet",
                                                   "Number_of_Likes",
                                                   "Source_of_Tweet",
                                                   "Tweet_Id",
                                                   "Tweet",
                                                   "Hashtags",
                                                   "Conversation_Id",
                                                   "In_reply_To",
                                                   "Coordinates",
                                                   "Place"])

example = tweet_scraper('(#example OR #suggestion) since:2020-09-01 until:2022-09-01', 500000)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM