简体   繁体   中英

Using Spring batch to read a file and write to a Map

Background

I am fairly new to Spring batch and have the following requirement :

  1. Read a file with a minumum of million records (CSV, pipe delimited etc)
  2. Load each row in the file into a Map with key as the first column and value as a domain object/POJO.

I understand that Spring batch has something known as chunk oriented processing where one configures a reader, processor and a writer to process a certain number of records governed by the commit-interval. This can further be scaled using a task executor for the reader or by adding another layer of multithreading through partitioning.

Question

As explained in point 2 above, I want to load my file into a Map. For the sake of discussion, lets say I implement the following ItemWriter that aggregates the chunks into a Map.

public class MapItemWriter implements ItemWriter<SomePOJO> {

    private Map<String, SomePOJO> somePojoMap; 

    public MapItemWriter() {
        System.out.println("Writer created ");
        somePojoMap= new ConcurrentHashMap<String, SomePOJO>();
    }

    public void write(List<? extends SomePOJO> item) throws Exception {
        if (item != null && item.size() > 0) {
            for (SomePOJO data : item) {
                String uniqueId = data.Id();
                somePojoMap.put(uniqueId, data);
             }
        }
    }

    public Map<String, SomePojo> getSomePojoMap() {
        return somePojoMap;
    }
}

Since I have access to my ItemWriter bean, I can later call getSomePojoMap to get the aggregated Map of records in my file; however, holding a Map like this in the ItemWriter doesn't feel like the best way to go about this. Another concern is that the use of a ConcurrentHashMap may degrade performance but I don't see any other way in which I can aggregate the file into a Map in a thread safe manner.

Is there a better way to aggregate my file into a Map rather than holding a Map in my writer and using a ConcurrentHashMap?

That's more or less it. You could make small improvements like putting the map in a separate bean, which would allow you to have a different lifetime for the writer bean and the map and also decouple the readers of the map from the writer. For instance you could put the map in a job scoped bean and still have the writer a singleton, for instance.

You only need a ConcurrentHashMap if your job is partitioned into multiple threads (I'm assuming you don't want the map shared across jobs).

Why won't you use File Item Writer.

I assume that this map should be written to a file. probably a flat file ( txt)

If this is the case try to use FlatFileItemWriter . In case you need to write this data to a xml file , you can use StaxEventItemWriter .

Even if your don't need to write the data to a file ( need only the map in the end of the batch processing). I think that it will be "cheaper" to write the data to a file and afterwards reading the whole map from the file. Saving the map within the job scope means that this object will be persist in db in every chunk, and will be retrieved from db on every chunk, quite expensive operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM