I currently have a Spring Batch Job with one single step that reads data from Oracle, passes the data through multiple Spring Batch Processors ( CompositeItemProcessor
) and writes the data to different destinations such as Oracle and files ( CompositeItemWriter
):
<batch:step id="dataTransformationJob">
<batch:tasklet transaction-manager="transactionManager" task-executor="taskExecutor" throttle-limit="30">
<batch:chunk reader="dataReader" processor="compositeDataProcessor" writer="compositeItemWriter" commit-interval="100"></batch:chunk>
</batch:tasklet>
</batch:step>
In the above step, the compositeItemWriter
is configured with 2 writers that run one after another and write 100 million records to Oracle as well as a file. Also, the dataReader
has a synchronized read method to ensure that multiple threads don't read the same data from Oracle. This job takes 1 hour 30 mins to complete as of today.
I am planning to break down the above job into two parts such that the reader/processors produce data on 2 Kafka topics (one for data to be written to Oracle and the other for data to be written to a file). On the other side of the equation, I will have a job with two parallel flows that read data from each topic and write the data to Oracle and file respectively.
With the above architecture in mind, I wanted to understand how I can refactor a Spring Batch Job to use Kafka. I believe the following areas is what I would need to address:
CompositeItemWriter
will be called for every 100 records and each writer will unpack the chunk and call the write method on it. Does this mean that when I write to Kafka, there will be 100 publish calls to Kafka?Note: I am aware of Kafka Connect but don't want to use it because it requires setting up a Connect cluster and I don't have the infrastructure available to support the same.
Answers to your questions:
multiple rows into one single message in Kafka to avoid multiple.network calls?
is invalid since multiple messages (rows) can be produced/consumed in a single.network call. For your first draft, I would suggest to keep it simple by having a single row correspond to a single message.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.