简体   繁体   中英

Mule batch - processing records in batch block and aggregating to file

I have an input file having 500k records. I need to process these records in batch, apply transformation and write to an output file. I'm trying to experiment it a bit with the below flow. The batch.block size is set to 1000. The output file contains only 1000 records. The rest of 490k records are lost.

As per my understanding, batch starts a new instance for each block size, in this case, every 1000 records will be processed by a new thread. Are these threads overwriting each other ? How do I collect all the transformed records into output file ?

    <flow name="poll-inbound-file">
        <file:inbound-endpoint path="${file.inbound.location}"
            pollingFrequency="${file.polling.frequency}" responseTimeout="10000"
            doc:name="File" metadata:id="abce53af-7d82-411a-a75a-5cd8ae8e55ae"
            fileAge="${file.fileage}" moveToDirectory="${file.outbound.location}"/>
        <custom-interceptor
            class="com.example.TimerInterceptor" doc:name="Timer" />

        <dw:transform-message doc:name="Transform Message"
            metadata:id="dcf84872-5aca-404f-9169-d448c9e4cd76">
            <dw:input-payload mimeType="application/csv" />
            <dw:set-payload><![CDATA[%dw 1.0
%output application/java
---
payload as :iterator]]></dw:set-payload>
        </dw:transform-message>
        <batch:job name="process-batchBatch" block-size="${batch.blocksize}">

        <batch:process-records>
            <batch:step name="Batch_Step1">
                <logger level="TRACE" doc:name="Logger" message="#[payload]" />
            </batch:step>
            <batch:step name="Batch_Step2">
                <logger level="TRACE" doc:name="Logger" message="#[payload]" />
            </batch:step>
            <batch:step name="Batch_Step3">

                <batch:commit  doc:name="Batch Commit" size="1000">
                <expression-component doc:name="Expression"><![CDATA[StringBuilder sb=new StringBuilder();
 for(String s: payload)
 {
     sb.append(s);
     sb.append(System.lineSeparator());
 }
 payload= sb.toString();]]></expression-component>
                    <file:outbound-endpoint path="${file.outbound.location}"
                        responseTimeout="10000" doc:name="File" />
                </batch:commit>
            </batch:step>
        </batch:process-records>
        <batch:on-complete>
            <logger
                message="******************************************** Batch Report **************************************"
                level="INFO" doc:name="Logger" />
        </batch:on-complete>
    </batch:job>

    </flow>

Writing to a file from multiple threads at the same time is generally not safe. Instead write your results to a queue such as ActiveMQ or the likes and have another flow which reads form the queue and then writes to the file. You can decide if you want to start processing from the queue before or after you have processed the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM