简体   繁体   English

分配调度程序作业的记录处理

[英]Distribute processing of records of scheduler job

I am working on a use case where I have a cron job scheduled (via quartz) which reads certain entries from db and process them. 我正在处理一个用例,其中我计划了一个cron作业(通过石英),该作业从db中读取某些条目并对其进行处理。 Now in each schedule, I can get thousands of records which need to be processed. 现在,在每个日程表中,我可以获得需要处理的数千条记录。 Processing each record takes time (in seconds/minutes). 处理每条记录需要时间(以秒/分钟为单位)。 Currently all those records are getting processed on single node (node elected by quartz). 当前,所有这些记录都在单个节点(石英选出的节点)上进行处理。 Now my challenge is to parallelize these records processing. 现在,我面临的挑战是并行处理这些记录。 Please help me in solving below concerns : 请帮助我解决以下问题:

  1. How I can distribute these records/tasks to a cluster of machines 我如何将这些记录/任务分配到机器集群
  2. If any machine fails after processing few records then remaining records should be processed by healthy nodes in cluster 如果任何机器在处理了很少的记录后发生故障,则剩余的记录应由集群中的正常节点处理
  3. Get a signal that all record processing is finished. 收到所有记录处理已完成的信号。

Create cron jobs to run separately on each host at the desired frequency. 创建cron作业以所需的频率在每个主机上分别运行。 You will need some form of lock on each record or some form of range lock on the record set to ensure that servers process mutually exclusive set of records. 您将需要对每个记录进行某种形式的锁定,或者对记录集进行某种形式的范围锁定,以确保服务器处理相互排斥的记录集。

eg : You can add following new field to all records: 例如:您可以将以下新字段添加到所有记录:

Locked By Server: Locked for Duration (or lock expiration time): 由服务器锁定:锁定持续时间(或锁定到期时间):

On each run, each cron picks a set of records that have expired or empty locks and then it aquires the lock on a small set of records by putting these two entries. 在每次运行中,每个cron都会选择一组已过期或空锁的记录,然后通过放置这两个条目来获取少量记录的锁。 Then it proceeds to process them. 然后继续处理它们。 If it crashes or gets stuck the lock expires, otherwise it is released on completion. 如果崩溃或卡住,则锁将过期,否则将在完成时释放。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 适用于Linux / Unix的脚本语言简单作业计划程序 - Simple job scheduler in script language for Linux/Unix Python:百万条记录的缓慢处理 - Python: slow processing of million records 数据处理和选定记录的更新 - Data processing and updating of selected records 使用MPI在处理器之间分配作业,但所有处理器都在完成整个工作,而不是只完成其中的一部分 - Using MPI to distribute the job between the processors but all the processors are doing the whole job instead of doing just some part of it Symfony2作业队列还是并行处理? - Symfony2 Job Queue or Parallel Processing? Apache 并行处理时不考虑骆驼调度程序延迟 - Apache Camel Scheduler delay not respected while parallel processing 使用 Azure 服务总线队列进行并行作业处理 - parallel job processing with Azure Service Bus Queues RxJava 在并行处理时不会使用调度器中的所有线程 - RxJava does not use all the threads in a Scheduler while parallel processing 等待几个DBMS_SCHEDULER.CREATE_JOB完成 - Waiting for completion of several DBMS_SCHEDULER.CREATE_JOB 并行处理数据库表中的记录 - Parallel processing of records from database table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM