简体   繁体   English

推文抓取 - 如何衡量推文强度?

[英]Tweets scraping - how to measure tweeting intensity?

I am looking for a method to get information of a "trend" regarding some hashtag/key word on Twitter.我正在寻找一种方法来获取有关 Twitter 上的某些主题标签/关键字的“趋势”信息。 Let`s say I want to measure how often the hashtag/key word "Python" is tweeted in time.假设我想测量主题标签/关键字“Python”在推特上的发布频率。 For instance, today, "Python" is tweeted on average every 1 minute but yesterday it was tweeted on average every 2 minutes.例如,今天,“Python”平均每 1 分钟发布一次推文,但昨天它平均每 2 分钟发布一次推文。

I have tried various options but I am always bouncing off the twitter API limitations, ie if I try to download all tweets for a hashtag during the last (for example) day, only a certain franction of the tweets is downloaded (via tweepy.cursor).我尝试了各种选项,但我总是摆脱 twitter API 限制,即如果我尝试在最后一天(例如)下载主题标签的所有推文,则只会下载一定比例的推文(通过 tweepy.cursor )。

Do you have any ideas / script examples of achieving similar results?你有什么想法/脚本示例可以实现类似的结果吗? Libraries or guides to recommend?图书馆或指南推荐? I did not find any help searching on the internet.我在互联网上搜索时没有找到任何帮助。 Thank you.谢谢你。

You should check twint repository.您应该检查twint存储库。

  • Can fetch almost all Tweets (Twitter API limits to last 3200 Tweets only);可以获取几乎所有推文(Twitter API 仅限于最后 3200 条推文);
  • Fast initial setup;快速初始设置;
  • Can be used anonymously and without Twitter sign up;可以匿名使用,无需 Twitter 注册;

here is a sample code:这是一个示例代码:

import twint


def scrapeData(search):
    c = twint.Config()

    c.Search = search

    c.Since = '2021-03-05 00:00:00'
    c.Until = '2021-03-06 00:00:00'
    c.Pandas = True
    c.Store_csv = True
    c.Hide_output = True
    c.Output = f'{search}.csv'
    c.Limit = 10  # number of tweets want to fetch

    print(f"\n#### Scraping from {c.Since} to {c.Until}")
    twint.run.Search(c)

    print("\n#### Preview: ")
    print(twint.storage.panda.Tweets_df.head())


if __name__ == "__main__":
    scrapeData(search="python")

Try a library called: GetOldTweets or GetOldTweets3尝试一个名为:GetOldTweets 或 GetOldTweets3 的库

Twitter Search, and by extension its API, are not meant to be an exhaustive source of tweets. Twitter 搜索,以及其 API 的扩展,并不意味着是推文的详尽来源。 The Twitter Streaming API places a limit of just one week on how far back tweets can be extracted from that match the input parameters. Twitter 流媒体 API 对可以从与输入参数匹配的推文中提取多远的推文设置了一周的限制。 So in order to extract all historical tweets relevant to a set of search parameters for analysis, the Twitter Official API needs to be bypassed and custom libraries that mimic the Twitter Search Engine need to be used.因此,为了提取与一组搜索参数相关的所有历史推文进行分析,需要绕过 Twitter 官方 API 并使用模仿 Twitter 的自定义库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM