简体   繁体   中英

is spring-batch for me, even though I don't have a usage for itemReader and itemWriter?

spring-batch newbie : I have a series of batches that

  • read all new records (since the last execution) from some sql tables
  • upload all the new records to hadoop
  • run a series of map-reduce (pig) jobs on all the data (old and new)
  • download all the output to local and run some other local processing on all the output

point is, I don't have any obvious "item" - I don't want to relate to the specific lines of text in my data, I work with all of it as one big chunk and don't want any commit intervals and such...

however, I do want to keep all these steps loosely coupled - as in, step a+b+c might succeed for several days and accumulate processed stuff while step d keeps failing, and then when it finally succeeds it will read and process all of the output of it's previous steps.

SO: is my "item" a fictive "working-item" which will signify the entire new data? do I maintain a series of queues myself and pass this fictive working-items between them?

thanks!

people always assume that the only use of spring batch is really only for the chunk processing. that is a huge feature, but what's overlooked is the visibility of the processing and job control.

give 5 people the same task with no spring batch and they're going to implement flow control and visibility their own way. give 5 people the same task and spring batch and you may end up with custom tasklets all done differently, but getting access to the job metadata and starting and stopping jobs is going to be consistent. from my perspective it's a great tool for job management. if you already have your jobs written, you can implement them as custom tasklets if you don't want to rewrite them to conform the 'item' paradigm. you'll still see benefits.

I don't see the problem. Your scenario seems like a classic application of Spring Batch to me.

  • read all new records (since the last execution) from some sql tables

Here, an item is a record

  • upload all the new records to hadoop

Same here

  • run a series of map-reduce (pig) jobs on all the data (old and new)

Sounds like a StepListener or ChunkListener

  • download all the output to local and run some other local processing on all the output

That's the next step.


The only problem I see is if you don't have Domain Objects for your records. But even then, you can work with maps or arrays, while still using ItemReaders and ItemWriters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM