使用Spring批处理读取文件并写入Map

Question

Background 背景

I am fairly new to Spring batch and have the following requirement : 我对Spring Batch很陌生，并且具有以下要求：

Read a file with a minumum of million records (CSV, pipe delimited etc) 读取至少具有一百万条记录的文件（CSV，分隔管道等）
Load each row in the file into a Map with key as the first column and value as a domain object/POJO. 将文件中的每一行加载到Map中，键为第一列，值为域对象/ POJO。

I understand that Spring batch has something known as chunk oriented processing where one configures a reader, processor and a writer to process a certain number of records governed by the commit-interval. 我了解到Spring批处理有一种称为面向块的处理，其中一个配置了一个读取器，处理器和写入器以处理由commit-interval控制的一定数量的记录。 This can further be scaled using a task executor for the reader or by adding another layer of multithreading through partitioning. 可以使用针对读者的任务执行程序，或通过分区添加多线程的另一层来进一步扩展规模。

Question 题

As explained in point 2 above, I want to load my file into a Map. 如上面第2点所述，我想将文件加载到Map中。 For the sake of discussion, lets say I implement the following ItemWriter that aggregates the chunks into a Map. 为了便于讨论，可以说我实现了下面的ItemWriter，它将这些块聚合到一个Map中。

public class MapItemWriter implements ItemWriter<SomePOJO> {

    private Map<String, SomePOJO> somePojoMap; 

    public MapItemWriter() {
        System.out.println("Writer created ");
        somePojoMap= new ConcurrentHashMap<String, SomePOJO>();
    }

    public void write(List<? extends SomePOJO> item) throws Exception {
        if (item != null && item.size() > 0) {
            for (SomePOJO data : item) {
                String uniqueId = data.Id();
                somePojoMap.put(uniqueId, data);
             }
        }
    }

    public Map<String, SomePojo> getSomePojoMap() {
        return somePojoMap;
    }
}

Since I have access to my ItemWriter bean, I can later call getSomePojoMap to get the aggregated Map of records in my file; 由于我可以访问ItemWriter Bean，因此以后可以调用getSomePojoMap来获取文件中记录的聚合Map； however, holding a Map like this in the ItemWriter doesn't feel like the best way to go about this. 但是，在ItemWriter中保存这样的Map并不是实现此目的的最佳方法。 Another concern is that the use of a ConcurrentHashMap may degrade performance but I don't see any other way in which I can aggregate the file into a Map in a thread safe manner. 另一个问题是，使用ConcurrentHashMap可能会降低性能，但是我看不到有其他方法可以以线程安全的方式将文件聚合到Map中。

Is there a better way to aggregate my file into a Map rather than holding a Map in my writer and using a ConcurrentHashMap? 有没有比将我的文件保存在编写器中并使用ConcurrentHashMap更好的方法来将文件聚合到Map中？

Answer 1

That's more or less it. 差不多。 You could make small improvements like putting the map in a separate bean, which would allow you to have a different lifetime for the writer bean and the map and also decouple the readers of the map from the writer. 您可以进行一些小的改进，例如将地图放在单独的Bean中，这样可以使writer Bean和地图具有不同的生存期，并使地图的读取器与writer分离。 For instance you could put the map in a job scoped bean and still have the writer a singleton, for instance. 例如，您可以将地图放入作业范围的Bean中，但仍使编写者为单例。

You only need a ConcurrentHashMap if your job is partitioned into multiple threads (I'm assuming you don't want the map shared across jobs). 如果您的作业被划分为多个线程，则仅需要ConcurrentHashMap （我假设您不希望跨作业共享地图）。

Answer 2

Why won't you use File Item Writer. 为什么不使用File Item Writer。

I assume that this map should be written to a file. 我认为应该将此映射写入文件。 probably a flat file ( txt) 可能是平面文件（txt）

If this is the case try to use FlatFileItemWriter . 如果是这种情况，请尝试使用FlatFileItemWriter 。 In case you need to write this data to a xml file , you can use StaxEventItemWriter . 如果您需要将此数据写入xml文件，则可以使用StaxEventItemWriter 。

Even if your don't need to write the data to a file ( need only the map in the end of the batch processing). 即使您不需要将数据写入文件（批处理结束时仅需要映射）。 I think that it will be "cheaper" to write the data to a file and afterwards reading the whole map from the file. 我认为将数据写入文件，然后从文件中读取整个地图将是“便宜的”。 Saving the map within the job scope means that this object will be persist in db in every chunk, and will be retrieved from db on every chunk, quite expensive operation. 将映射保存在作业范围内意味着该对象将在每个块中都保留在db中，并且将在每个块中从db中检索到，这是非常昂贵的操作。

使用Spring批处理读取文件并写入Map

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-02-23 14:19:36

解决方案2
0 2016-02-23 14:46:29

使用Spring批处理读取文件并写入Map

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-02-23 14:19:36

解决方案2 0 2016-02-23 14:46:29

解决方案1
1 已采纳 2016-02-23 14:19:36

解决方案2
0 2016-02-23 14:46:29