在Google App Engine上使用Twitter应用程序的正确方法是什么？

Question

I am trying to develop a Twitter App on Google App Engine. 我正在尝试在Google App Engine上开发Twitter应用程序。 The app basically collects all tweets from a Twitter user's and his/her followers and their followers and so on. 该应用程序基本上收集Twitter用户和他/她的粉丝及其粉丝的所有推文等。 It typically collects 500 tweets per run per user and then inserts the data for the user into the database. 它通常每个用户每次运行收集500条推文，然后将用户的数据插入数据库。

The tweet collection process has to be done every hour. 推文收集过程必须每小时完成一次。 Currently, I am using cron jobs for doing this. 目前，我正在使用cron作业来做这件事。 But it gives a lot of Deadline exceeded errors, even for one user, which is not a good sign. 但它给了很多截止日期超出的错误，即使对于一个用户来说，这也不是一个好兆头。 I am using Python. 我正在使用Python。 So I wanted to know what should I use for this? 所以我想知道我应该为此使用什么？ I have searched on the web and came to know that task queues along with cron can be used. 我在网上搜索过，并且知道可以使用任务队列和cron。 But I have no idea how to do that. 但我不知道该怎么做。 I will be very thankful if someone can help me with that. 如果有人可以帮助我，我将非常感激。 Also is there any other method/approach which I can use? 我还可以使用其他任何方法/方法吗？

Answer 1

To avoid DeadlineExceededExceptions, use multiple Deferred Push Task Queues . 要避免DeadlineExceededExceptions，请使用多个延迟推送任务队列。 With Task Queues, it's easier to break up several tasks into smaller units of work, which prevents any individual task from exceeding the 10 minute threshold allocated to Task Queues. 使用任务队列，可以更轻松地将多个任务分解为更小的工作单元，从而防止任何单个任务超过分配给任务队列的10分钟阈值。

With the Task Queue API, applications can perform work outside of a user request, initiated by a user request. 使用Task Queue API，应用程序可以在用户请求之外的用户请求之外执行工作。 If an app needs to execute some background work, it can use the Task Queue API to organize that work into small, discrete units, called tasks. 如果应用程序需要执行一些后台工作，它可以使用任务队列API将该工作组织成称为任务的小型离散单元。 The app adds tasks to task queues to be executed later. 该应用程序将任务添加到稍后要执行的任务队列。

Deferred Task Queues are Push Task Queues that are essentially scheduled tasks that have a predetermined time for when they should fire. 延迟任务队列是推送任务队列，它们本质上是计划任务，具有预定时间，以便何时触发。 Here is a short sample of how to create a Deferred Task: 以下是如何创建延迟任务的简短示例：

import logging

from google.appengine.ext import deferred

  def do_something_expensive(a, b, c=None):
      logging.info("Fetching Twitter feeds!")
      # Fetch the Twitter data here


# Somewhere else - Pass in parameters needed by the Twitter API
deferred.defer(do_something_expensive, "BobsTwitterParam1", "BobsTwitterParam2", c=True)
deferred.defer(do_something_expensive, "BobsFriendTwitterParam1", "BobsFriendTwitterParam2", c=True)

Your process of fetching data from Twitter users is recursive by nature, since you're fetching data for followers of followers and so forth, and this task as a single process can be quite expensive and would likely exceed the threshold. 从Twitter用户获取数据的过程本质上是递归的，因为您正在为追随者的追随者等提取数据，并且此任务作为单个过程可能非常昂贵并且可能超过阈值。

A task must finish executing and send an HTTP response value between 200–299 within 10 minutes of the original request. 任务必须在原始请求的10分钟内完成执行并发送200-299之间的HTTP响应值。 This deadline is separate from user requests, which have a 60-second deadline. 此截止日期与用户请求分开，用户请求的截止日期为60秒。 If your task's execution nears the limit, App Engine raises a DeadlineExceededError (from the module google.appengine.runtime) that you can catch to save your work or log progress before the deadline passes. 如果您的任务的执行接近极限，App Engine会引发DeadlineExceededError（来自模块google.appengine.runtime），您可以在截止日期过后保存您的工作或记录进度。 If the task failed to execute, App Engine retries it based on criteria that you can configure. 如果任务未能执行，App Engine将根据您可以配置的条件对其进行重试。

However, if you separate each Twitter user into a completely separate Task, then each task only runs for as long as it takes to fetch the Twitter results for a single user. 但是，如果您将每个Twitter用户分成一个完全独立的任务，那么每个任务只运行一次，以获取单个用户的Twitter结果。 Not only is this more efficient, but if there is a problem fetching one of the user's data, only that task would fail while the others should continue to execute. 这不仅效率更高，而且如果在获取用户数据之一时出现问题，则只有该任务失败而其他任务应继续执行。

In other words, don't try to fetch all of the data in a single Task. 换句话说，不要尝试在单个任务中获取所有数据。

Alternatively, if in the unlikely event or for whatever reason these tasks should exceed the 10 minute threshold, look into Backends . 或者，如果在不太可能的情况下或由于某种原因这些任务超过10分钟的阈值，请查看后端。

在Google App Engine上使用Twitter应用程序的正确方法是什么？

问题描述

1 个解决方案

解决方案1
4 2012-05-20 19:29:48

在Google App Engine上使用Twitter应用程序的正确方法是什么？

问题描述

1 个解决方案

解决方案1 4 2012-05-20 19:29:48

解决方案1
4 2012-05-20 19:29:48