简体   繁体   English

使用 LINQ to Twitter 查找不活跃 Twitter 关注者的最有效方法

[英]Most efficient way to find inactive Twitter followers using LINQ to Twitter

The number of queries allowed by the Twitter API is limited. Twitter API 允许的查询数量是有限的。 On the other side the definition of "inactive user" may imply a different algorithm with an impact on request number.另一方面,“非活动用户”的定义可能意味着对请求数量有影响的不同算法。

I'm looking for the most efficient way, in number of queries and quality of "inactivity", to find the inactive followers using LINQ to Twitter.我正在寻找最有效的方法,在查询数量和“不活动”质量方面,使用 LINQ to Twitter 查找不活跃的关注者。

As you must have learned by now, rate-limits and count restrictions prevent a lot of operations on the Twitter API.正如您现在必须了解的那样,速率限制和计数限制阻止了 Twitter API 上的许多操作。 With these constraints, most answers will be less than adequate, but here's a general approach I would use:有了这些限制,大多数答案都不够充分,但这是我将使用的一般方法:

  1. Get the list of all follower IDs, using theListing Followers query.使用Listing Followers查询获取所有关注者 ID 的列表 Make sure you max out Count at 5000 to reduce the number of queries.确保将 Count 设置为 5000 以减少查询次数。 If you have users with hundreds of thousands (or even millions) of followers, this isn't optimal, but is still the most efficient option.如果您的用户拥有数十万(甚至数百万)关注者,这不是最佳选择,但仍然是最有效的选择。
  2. With that list, you can do Querying User Details queries.使用该列表,您可以执行Querying User Details查询。 The situation here is even worse because the max number of comma-separated user IDs is 100. Here you might consider keeping track of UserIDs to classify them by activity/date of last scan to avoid re-visiting users that you already know are inactive.这里的情况更糟,因为逗号分隔的用户 ID 的最大数量是 100。在这里,您可以考虑跟踪 UserID,以按活动/上次扫描日期对它们进行分类,以避免再次访问您已经知道处于非活动状态的用户。
  3. That last query will give you User entities.最后一个查询将为您提供 User 实体。 Each User entity has a Status property for the user's most recent tweet.每个 User 实体都有一个用户最近推文的 Status 属性。 One idea might be to examine the CreatedAt date to determine whether to query that user any further.一种想法可能是检查 CreatedAt 日期以确定是否进一步查询该用户。 eg is that last tweet was N months ago, the user is probably inactive.例如,最后一条推文是 N 个月前,用户可能处于非活动状态。
  4. Use ApplicationOnlyAuthorizer when you can because it gives you higher rate limits.尽可能使用 ApplicationOnlyAuthorizer,因为它为您提供更高的速率限制。
  5. Your rate limit windows are 15 minutes.您的速率限制窗口是 15 分钟。 Create pipelines by performing a certain query type for the limit and queue the results for the next task in the chain.通过对限制执行特定查询类型来创建管道,并将结果排队以供链中的下一个任务使用。 Let the next task use it's limit and keep going from there.让下一个任务使用它的限制并从那里继续前进。

One of the things about this is how you define "Active" and "Inactive" because there might be edge cases.与此相关的一件事是您如何定义“活动”和“非活动”,因为可能存在边缘情况。 eg What if you have folks that don't tweet much, but they DM, favorite, or RT.例如,如果您有一些人不怎么发推文,但是他们 DM、最喜欢的或 RT。 You'll have to do queries on a user's activity to pull out that extra data.您必须对用户的活动进行查询才能提取额外的数据。 Hopefully, this will either validate what you already know or maybe add an idea or two that could be useful.希望这将验证您已经知道的内容,或者添加一两个可能有用的想法。

Note: Consider Gnip if you're willing to pay and avoid the rate limits.注意:如果您愿意付费并避免费率限制,请考虑使用 Gnip。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM