简体   繁体   中英

Running Spring job for a defined interval of time

I have a database that contains a table document . This table defines paths to documents I will process.
The processing of documents is very heavy and may take several minutes for a single document.
I have more than 200 000 files to process.
The documents are hosted in an in-production application. So I have to process them nightly.
My question is: is it possible to define a spring-batch job, that queries documents (that are not processed) from DB and process them, and then schedule that job (with Quartz) to be stopped at say 8am and restarted at 8pm next day?

EDIT
I think I should make myself more clear:
My question is about: should I have one job to process all documents and make it stop everyday and restart by the end of the day. Or should I make the job to process only one document every time ?

By now, I am using just one job to iterate over all documents (since I Spring Batch docs) all examples I found, are talking about reading the whole table (with a reader) and process data.
If this is the good approach, then how can I interrupt the job execution to be continued by the end of the day.
Or should I just use one job by document ?

Yes this is possible.

The cron expression for this would be something like:

0 0/1 20-8 ? * MON-FRI

Just confirm this (it's been a while since I looked at cron expressions) but this should run every minute between the hours of 20:00 and 08:00 on Monday to Friday.

Quartz jobs do not run concurrently within Spring by default (see: http://static.springsource.org/spring/docs/3.0.x/reference/scheduling.html ) so you won't have to worry about overlapping. You can then select a defined number of documents for processing (10 say) in each run and every minute up to 8am Quartz will fire off another run if the previous one has finished. When the last processing job finishes in the morning it will not fire it off again until 5pm.

Note that the last job might start at 7:59:59 and run past the 8am mark so you might want to bring the end time a little earlier to compensate.

Edit:

I think a more fine grained approach (not necessarily a single document but maybe a block) is more suitable for batching and scheduling. This is effectively using quartz to do the looping that you would be doing in a single job but gives you all the benefits of not having to worry about the scheduling element!

You will want to have a job to process one document from DB at a time.

With cron trigger in Spring Quartz you can schedule it to run from 8 PM to 7:30 AM (if one job takes around 30 mins) on regular intervals (say after every 30 mins...)

You can have job do below things.

read 1 (unprocessed) document path from DB process document. delete (Or mark as processed) in DB commit

To make a job start on a schedule, you can use a Quartz scheduler. However, this will not terminate the job at a specific time. To achieve this you should

  1. Make sure your job is restartable and working on the smallest possible units of work.
  2. Make a custom Job wrapper that starts a timer on your job start and polls it every 1min to determine if it must shutdown and when this is needed, call the execution context and cancel the job.
  3. Because the job is restartable, it will be able to restart from the point it left off the next time the Quartz scheduler calls it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM