简体   繁体   中英

Apache Camel Splitter, Threadpool and JMS

I have defined the following route in spring xml to split rows in a text file and send each row to a JMS queue

<bean id="myPool" class="java.util.concurrent.Executors" factory-method="newCachedThreadPool"/>

<camelContext id="concurrent-route-context" xmlns="http://camel.apache.org/schema/spring" trace="true">
    <route id="inbox-threadpool-split-route">
        <from uri="{{inbox.uri}}" />
            <log message="Starting to process file: ${header.CamelFileName}" />
            <split streaming="true" executorServiceRef="myPool">
                <tokenize token="\n" />
                <to uri="{{inventory.queue.uri}}" />    
            </split>
            <log message="Done processing file: ${header.CamelFileName}" />
    </route>
</camelContext>

inbox.uri is a file component uri listening for file in a directory, while inventory.queue.uri is a JmsComponent uri that connecting to a queue in JMS server(Tibco EMS 6.X version). The JmsComponent uri is simple like "JmsComponent:queue:?username=&password="

The above route can be run without error, but the rows splitted from file are not sent to the queue as JMS message (ie the queue is still empty after the program has run)

If I remove the executorServiceRef="myPool" from the splitter definition (remaining definition like ), the splitted messages can be delivered to JMS queue one by one.

If I replace the "to" uri with a "direct" endpoint, then splitted messages can be delivered no matter threadpool is used in splitter or not

Is there any special setting required in JmsComponent in order to make it works with Splitter + threadpool? or any other configurations I have missed?

======= Edit on 20150731 =======

I was suffering from the above issue when testing with a Big CSV file with 1000 rows. If I test with a small file (eg 10 rows only), I can see that the messages are delivered to inventory.queue, but from the log it seems that it takes 10 seconds to complete the splitting and deliver the messages to queue... Below captured the log:

2015-07-31 11:02:07,210 [main           ] INFO  SpringCamelContext             - Apache Camel 2.15.0 (CamelContext: concurrent-route-context) started in 1.301 seconds
2015-07-31 11:02:07,220 [main           ] INFO  MainSupport                    - Apache Camel 2.15.0 starting
2015-07-31 11:02:17,250 [://target/inbox] INFO  inbox-threadpool-split-route   - Done processing file: smallfile.csv

see the route started at 11:02:07 and show the "Done processing..." statement at 11:02:17, ie 10 seconds

If I test again with a CSV with 5 rows, it will take 5 seconds... It seems like it takes 1 second per row for splitting and deliver to JMS queue... which is very slow

If I change the "to uri" to "direct" instead of "JMS", the splitting can be completed very fast within a second

Also, from the JMS listener log, it was able to receive all 10 messages in same second. It seems like the Splitter will read and split the whole file, "prepare" the 10 JMS messages for all ten rows, and then deliver all the messages to queue afterward, but not "split 1 row and deliver 1 JMS message immediately"...

Is there any options or configurations that could change the Splitter behavior and enhance the split performance?

I had similar issue while processing 14G file using splitter with tokenizing. I was able to overcome the performance hump by using Aggregator as pointed by Claus's post on Parsing Large Files with Apache Camel

After aggregating batch messages, I used producer template to route those messages to messaging system. Hope that helps.

Thanks for the reference link shared by @Aayush Tuladhar, I have updated my route as follows:

<camelContext id="concurrent-route-context" xmlns="http://camel.apache.org/schema/spring" trace="false" >
    <route id="inbox-threadpool-split-route">
        <from uri="{{inbox.uri}}" />
            <log message="Starting to process file: ${header.CamelFileName}" />
            <split streaming="true" executorServiceRef="myPool">
                <tokenize token="\n" />
                <log message="split index - $simple{property.CamelSplitIndex}, row content=$simple{body}" />
                <aggregate strategyRef="stringBodyAggregator"  completionInterval="750"  >
                    <correlationExpression>
                        <simple>property.CamelSplitIndex</simple>
                    </correlationExpression>
                    <to uri="{{inventory.queue.uri}}" />
                </aggregate>
            </split>
            <log message="Done processing file: ${header.CamelFileName}" />
    </route>
</camelContext>     

The trick here is that an aggregator was added within the splitter, which used

property.CamelSplitIndex 

as the correlationExpression. CamelSplitIndex keeps incrementing for each splitted row, so the aggregator didn't actually "aggregating" anything, but ends the "aggregation" and enqueue JMS message to JMS queue immediately. The aggregationStrategy simply joins oldExchange and newExchange, but it is not important here, as it is just used for fulfilling the required attribute "strategyRef" for aggregate EIP

One point to note is that after using this trick, the performance bottleneck shifted to the JMS message producer, which was delivering 1 message per second... I solved this issue by leveraging the CachingConnectionFactory to define the JMS connection in Spring.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM