简体   繁体   English

在规定的时间间隔内运行Spring作业

[英]Running Spring job for a defined interval of time

I have a database that contains a table document . 我有一个包含表document的数据库。 This table defines paths to documents I will process. 该表定义了我将要处理的文档的路径。
The processing of documents is very heavy and may take several minutes for a single document. 文档的处理非常繁琐,单个文档可能要花费几分钟。
I have more than 200 000 files to process. 我有20万多个文件要处理。
The documents are hosted in an in-production application. 这些文档托管在生产中的应用程序中。 So I have to process them nightly. 所以我必须每晚处理它们。
My question is: is it possible to define a spring-batch job, that queries documents (that are not processed) from DB and process them, and then schedule that job (with Quartz) to be stopped at say 8am and restarted at 8pm next day? 我的问题是:是否可以定义一个spring-batch作业,该作业可以从DB查询(未处理的)文档并进行处理,然后安排该作业(使用Quartz)在上午8点停止并在晚上8pm重新启动天?

EDIT 编辑
I think I should make myself more clear: 我想我应该让自己更清楚:
My question is about: should I have one job to process all documents and make it stop everyday and restart by the end of the day. 我的问题是:我是否应该有一份工作来处理所有文档,并使其每天停止并在一天结束时重新启动。 Or should I make the job to process only one document every time ? 还是应该让我的工作每次只处理一个文档?

By now, I am using just one job to iterate over all documents (since I Spring Batch docs) all examples I found, are talking about reading the whole table (with a reader) and process data. 到现在为止,我只使用一项工作来遍历所有文档(自从我使用Spring Batch文档以来),就发现了所有示例,他们都在谈论(使用阅读器)读取整个表并处理数据。
If this is the good approach, then how can I interrupt the job execution to be continued by the end of the day. 如果这是个好方法,那么我该如何中断工作执行,直到一天结束。
Or should I just use one job by document ? 或者我应该只按文档使用一项工作?

Yes this is possible. 是的,这是可能的。

The cron expression for this would be something like: Cron表达式类似于:

0 0/1 20-8 ? * MON-FRI

Just confirm this (it's been a while since I looked at cron expressions) but this should run every minute between the hours of 20:00 and 08:00 on Monday to Friday. 只需确认一下即可(自从我看过cron表达式以来已经有一段时间了),但这应该在周一至周五的20:00至08:00之间的每一分钟运行。

Quartz jobs do not run concurrently within Spring by default (see: http://static.springsource.org/spring/docs/3.0.x/reference/scheduling.html ) so you won't have to worry about overlapping. 在默认情况下,Quartz作业不会在Spring中并发运行(请参阅: http : //static.springsource.org/spring/docs/3.0.x/reference/scheduling.html ),因此您不必担心重叠。 You can then select a defined number of documents for processing (10 say) in each run and every minute up to 8am Quartz will fire off another run if the previous one has finished. 然后,您可以在每次运行中选择定义数量的文档进行处理(10个说),如果前一个运行结束,直到凌晨8点,Quartz每分钟都会触发另一个运行。 When the last processing job finishes in the morning it will not fire it off again until 5pm. 当最后一个处理工作在早上完成时,它将直到下午5点才再次将其解雇。

Note that the last job might start at 7:59:59 and run past the 8am mark so you might want to bring the end time a little earlier to compensate. 请注意 ,上一份工作可能会在7:59:59开始并超过上午8点,因此您可能希望将结束时间提前一些以进行补偿。

Edit: 编辑:

I think a more fine grained approach (not necessarily a single document but maybe a block) is more suitable for batching and scheduling. 我认为更细粒度的方法(不一定是单个文档,而可能是一个块)更适合于批处理和调度。 This is effectively using quartz to do the looping that you would be doing in a single job but gives you all the benefits of not having to worry about the scheduling element! 这有效地利用了石英来完成您将要在单个工作中执行的循环,但是却为您带来了不必担心调度元素的所有好处!

You will want to have a job to process one document from DB at a time. 您将需要一项工作来一次处理DB中的一个文档。

With cron trigger in Spring Quartz you can schedule it to run from 8 PM to 7:30 AM (if one job takes around 30 mins) on regular intervals (say after every 30 mins...) 在Spring Quartz中使用cron触发器 ,您可以将其计划为定期运行(例如每30分钟之后),从晚上8点运行到7:30 AM(如果一项作业大约需要30分钟)。

You can have job do below things. 您可以做下面的事情。

read 1 (unprocessed) document path from DB process document. 从数据库处理文档中读取1(未处理)文档路径。 delete (Or mark as processed) in DB commit 在数据库提交中删除(或标记为已处理)

To make a job start on a schedule, you can use a Quartz scheduler. 要使作业按计划开始,可以使用Quartz计划程序。 However, this will not terminate the job at a specific time. 但是,这不会在特定时间终止作业。 To achieve this you should 为此,您应该

  1. Make sure your job is restartable and working on the smallest possible units of work. 确保您的作业可重新启动并且以最小的工作单位进行工作。
  2. Make a custom Job wrapper that starts a timer on your job start and polls it every 1min to determine if it must shutdown and when this is needed, call the execution context and cancel the job. 创建一个自定义的Job包装器,该包装器在您的作业开始时启动一个计时器,并每隔1分钟轮询一次,以确定是否必须关闭计时器,何时需要关闭计时器,调用执行上下文并取消作业。
  3. Because the job is restartable, it will be able to restart from the point it left off the next time the Quartz scheduler calls it. 因为该作业是可重新启动的,所以它将能够从下一次Quartz调度程序调用它的中断点重新启动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM