简体   繁体   English

确保在GAE中仅将任务添加到一次推送队列中

[英]Guaranteeing that a task is added to push queue only once in GAE

I want to ensure that a task — especially a task that operates on a single entity — gets added to a push queue at-most-once, till such a time that the previously added task is finished. 我想确保一个任务(尤其是在单个实体上运行的任务)最多一次添加到推送队列中,直到之前添加的任务完成为止。 Then I should be able to add the same task — for the same entity — again. 然后,我应该能够再次为相同的实体添加相同的任务。

A simple example is a task that updates entity A. I want to be able to: 一个简单的示例是更新实体A的任务。我希望能够:

  1. Add task X to update entity A to a push queue. 添加任务X以将实体A更新到推送队列。
  2. While task X is in the queue for entity A, all other attempts to add task X for entity A should fail. 当任务X在实体A的队列中时,为实体A添加任务X的所有其他尝试都将失败。
  3. Once finished, I should once again be able to add task X for entity A. 完成后,我应该能够再次为实体A添加任务X。

The simple solution seems to be to use a task name that incorporates both the name of the task X and the unique ID of entity A. 一种简单的解决方案似乎是使用一个包含任务X的名称和实体A的唯一ID的任务名称。

However, I think this approach doesn't satisfy condition 3: task names get "tombstoned" for an uncontrollable period & can't be re-used till then. 但是,我认为这种方法不能满足条件3:任务名称在无法控制的时期内被“逻辑删除”,并且在此之前无法重复使用。

From the docs : 文档

An advantage of assigning your own task names is that named tasks are de-duplicated, which means you can use task names to guarantee* that a task is only added once. 分配自己的任务名称的一个好处是,已命名的任务将被重复删除,这意味着您可以使用任务名称来保证*任务仅添加一次。 De-duplication continues for 9 days after the task is completed or deleted. 完成或删除任务后,重复数据删除将持续9天。

Does this mean task names can't be re-used for 9 days? 这是否意味着任务名称不能重复使用9天?

Indeed, task names can't be re-used for 9 days after they are no longer in the queue. 确实,任务名称不再在队列中之后,无法再使用9天。 Probably a safety reason to ensure all traces of the previous identically-named tasks are flushed from the entire distributed infra. 可能是出于安全考虑,以确保从整个分布式基础结构中清除以前相同名称的任务的所有痕迹。

You could encode in the task name the current timestamp, rounded to the full second, which would limit your actual write rate to 1/s (which is the max average write rate to the same entity group anyways). 您可以在任务名称中编码当前时间戳,四舍五入为整秒,这会将您的实际写入速率限制为1 / s(无论如何,这是对同一实体组的最大平均写入速率)。 If you fail to enqueue the task (because it is already in the queue) you try to enqueue one for the next second (if you don't have some alternate way of triggering another update task). 如果您无法使任务入队(因为它已经在队列中),则尝试将其排入下一秒(如果您没有触发其他更新任务的其他方法)。 But encode the timestamp towards the end of the task name, not the beginning, to avoid the performance implications mentioned in the same doc you referenced. 但是,将时间戳记编码在任务名称的末尾而不是开始处,以避免您引用的同一文档中提到的性能影响。

I have this use case in the past where I need to do lots of small updates to a single entity, but the update does not need to be reflected immediately. 过去我有这个用例,我需要对单个实体进行很多小的更新,但是更新无需立即反映出来。 I solved it by batching the update in a pull queue and I have cron job run every X mins to pull a number of tasks and do batch update. 我通过在拉出队列中分批更新来解决它,并且我每隔X分钟运行一次cron作业来拉出许多任务并进行批处理更新。 In my case the cron job simply enqueue a task to a push queue. 在我的情况下,cron作业只是将任务排入推送队列。 The task then consume from the pull queue and do transactional update. 然后,该任务从提取队列中使用并进行事务更新。

Reference doc https://cloud.google.com/datastore/docs/articles/fast-and-reliable-ranking-in-datastore/ 参考文档https://cloud.google.com/datastore/docs/articles/fast-and-reliable-ranking-in-datastore/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM