简体   繁体   中英

Spring Batch Parallel Processing executing one step multiple times

I am executing spring batch job in parallel and using SimpleAsyncTaskExecutor for parallel processing with throttle-limit to default (which is 4 by default). The item reader is reading lines from a text file and then processing. But what is happeing is one line in text file is getting processed with 4 different threads, making it execting a single chunk 4 times.

Below is my batch.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
    <import resource="classpath*:/META-INF/spring/batch/override/**/*.xml" />
    <bean id="businessReader" class="com.rbsgbm.rates.eodtasks.batch.reader.BusinessItemReader"/>
    <bean id="businessProcessor" class="com.rbsgbm.rates.eodtasks.batch.processor.BusinessItemProcessor" />
    <bean id="businessWriter" class="com.rbsgbm.rates.eodtasks.batch.writer.BusinessItemWriter" />
    <bean id="deskReader" class="com.rbsgbm.rates.eodtasks.batch.reader.DeskItemReader"/>
    <bean id="deskProcessor" class="com.rbsgbm.rates.eodtasks.batch.processor.DeskItemProcessor" />
    <bean id="deskWriter" class="com.rbsgbm.rates.eodtasks.batch.writer.DeskItemWriter" />
    <bean class="com.rbsgbm.rates.eodtasks.batch.Tasklet.TradeSnapTasklet" id="tradeSnapTasklet"/>
    <bean class="com.rbsgbm.rates.eodtasks.batch.Tasklet.FoundryExtractTasklet" id="foundryExtractTasklet"/>
    <bean id="simpleFireTasklet"
        class="com.rbsgbm.rates.eodtasks.batch.Tasklet.SimpleFireTasklet" />

    <bean id="mdxMarketDataSnapTasklet"
        class="com.rbsgbm.rates.eodtasks.batch.Tasklet.MdxMarketDataSnapTasklet" />

    <bean id="stepListener" class="org.springframework.batch.core.listener.StepExecutionListenerSupport" />
    <bean id="restartJobListener" class="com.rbsgbm.rates.eodtasks.batch.listener.RestartListener"/>
    <bean id="failedStepListener" class="com.rbsgbm.rates.eodtasks.batch.listener.FailedStepStepExecutionListener"/>
    <bean id="taskExecutor"
        class="org.springframework.core.task.SimpleAsyncTaskExecutor">
    </bean>

    <job id="simpleDojJob"  xmlns="http://www.springframework.org/schema/batch">
        <step id="processBusiness" next="simpleFireTask">
            <tasklet>
                <chunk reader="businessReader" processor="businessProcessor"
                    writer="businessWriter" commit-interval="1" />
            </tasklet>

        </step>

        <step id="simpleFireTask" next="foundryTask">
            <tasklet task-executor="taskExecutor">
                <chunk reader="deskReader" processor="deskProcessor"
                    writer="deskWriter" commit-interval="1" />
            </tasklet>

        </step>

        <step id="foundryTask">
            <tasklet ref="foundryExtractTasklet"/>
            <listeners>
                    <listener ref="stepListener"/>
                    <listener ref="restartJobListener"/>
                    <listener ref="failedStepListener"/>
            </listeners>    
        </step>
    </job>
</beans>

If you want to have thread-safe Readers and Writers, you have to implement them this way.

Per default, every thread will access the same instance of your reader or writer potentially at the very same moment. If your reader and writer is not implemented for that, it will fail to handle it correctly.

The most easiest thing to make sure they are thread-safe, is to mark the reader, respectively the writer method as synchronized.

If you cannot change the code of the Reader/Writer, just implement a simple Wrapper and delegate to your Reader/Writer:

public class SynchronizedItemReader<T> implements ItemReader<T>
{
    private ItemReader<T> delegate;
    public void setDelegate(ItemReader<T> delegate) {this.delegate = delegate};

    public synchronized T read() {
        return delegate.read();
    }
}

But note: If you also implement ItemStream to track what has been successfully committed by the writer (and therefore to be able to restart at that position) you need also to manage that, since the chunks can overtake each other.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM