简体繁体中英

MapReduce/Aggregate operations in SpringBatch

原文 2011-05-25 06:55:37 6 2 java/ mapreduce/ batch-processing/ spring-batch

Is it possible to do MapReduce style operations in SpringBatch?

I have two steps in my batch job. The first step calculates average. The second step compares each value with average to determine another value.

For example, Lets say i have a huge database of Student scores. The first step calculates average score in each course/exam. The second step compares individual scores with average to determine grade based on some simple rule:

A if student scores above average
B if student score is Average
C if student scores below average

Currently my first step is a Sql which selects average and writes it to a table. Second step is a Sql which joins average scores with individual scores and uses a Processor to implement the rule.

There are similar aggregation functions like avg, min used a lot in Steps and I'd really prefer if this can be done in Processors keeping the Sqls as simple as possible. Is there any way to write a Processor which aggregates results across multiple rows based on a grouping criteria and then Writes Average/Min to the Output table once?

This pattern repeats a lot and i'm not looking for a Single processor implementation using a Sql which fetches both average and individual scores.

2 answers

It is possible. You do not even need more than one step. Map-Reduce can be implemented in a single step. You can create a step with ItemReader and ItemWriter associated with it. Think of ItemReader -ItemWriter pair as of Map- Reduce. You can achieve the neccessary effect by using custom reader and writer with propper line aggregation. It might be a good idea for your reader/writer to implement Stream interface to guarantee intermediate StepContext save operation by Spring batch.

I tried it just for fun, but i think that it is pointless since your working capacity is limited by single JVM, in other words: you could not reach Hadoop cluster (or other real map reduce implementationns) production environment performance. Also it will be really hard to be scallable as your data size grows.

Nice observation but IMO currently useless for real world tasks.

I feel that a batch processing framework should separate programming/configuration and run-time concerns.It would be nice if spring batch provides a generic solution over a all major batch processing run times like JVM, Hadoop Cluster(also uses JVM) etc.

-> Write batch programs using Spring batch programming/Configuration model that integrates other programming models like map-reduce,traditional java etc.

-> Select the run-times based on your need (single JVM or Hadoop Cluster or NoSQL).

Spring Data attempts solve a part of it, providing a unified configuration model and API usage for various type of data sources.).

Java 8 and aggregate operations on stream

Java: Aggregate Operations

SpringBatch in SystemExiter()

Java Aggregate Operations vs Anonymous class suggestion

Using Spark's MapReduce to call a different function and aggregate

For given operations on a large set of data, is there a way to determine if the data can be decomposed into mapreduce operations?

Write SpringBatch using SpringBoot

SpringBatch : dynamic datasource values

webjobs in azure for a springBatch project

Copying a collection of objects to another collection without duplicates using aggregate operations

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Java 8 and aggregate operations on stream Java: Aggregate Operations SpringBatch in SystemExiter() Java Aggregate Operations vs Anonymous class suggestion Using Spark's MapReduce to call a different function and aggregate For given operations on a large set of data, is there a way to determine if the data can be decomposed into mapreduce operations? Write SpringBatch using SpringBoot SpringBatch : dynamic datasource values webjobs in azure for a springBatch project Copying a collection of objects to another collection without duplicates using aggregate operations

Related Tags

MapReduce/Aggregate operations in SpringBatch

Question

2 answers

solution1
2 2012-04-27 11:37:30

solution2
0 2013-05-16 02:54:00

MapReduce/Aggregate operations in SpringBatch

Question

2 answers

solution1 2 2012-04-27 11:37:30

solution2 0 2013-05-16 02:54:00

solution1
2 2012-04-27 11:37:30

solution2
0 2013-05-16 02:54:00