简体   繁体   English

使用Spring批处理读取文件并写入Map

[英]Using Spring batch to read a file and write to a Map

Background 背景

I am fairly new to Spring batch and have the following requirement : 我对Spring Batch很陌生,并且具有以下要求:

  1. Read a file with a minumum of million records (CSV, pipe delimited etc) 读取至少具有一百万条记录的文件(CSV,分隔管道等)
  2. Load each row in the file into a Map with key as the first column and value as a domain object/POJO. 将文件中的每一行加载到Map中,键为第一列,值为域对象/ POJO。

I understand that Spring batch has something known as chunk oriented processing where one configures a reader, processor and a writer to process a certain number of records governed by the commit-interval. 我了解到Spring批处理有一种称为面向块的处理,其中一个配置了一个读取器,处理器和写入器以处理由commit-interval控制的一定数量的记录。 This can further be scaled using a task executor for the reader or by adding another layer of multithreading through partitioning. 可以使用针对读者的任务执行程序,或通过分区添加多线程的另一层来进一步扩展规模。

Question

As explained in point 2 above, I want to load my file into a Map. 如上面第2点所述,我想将文件加载到Map中。 For the sake of discussion, lets say I implement the following ItemWriter that aggregates the chunks into a Map. 为了便于讨论,可以说我实现了下面的ItemWriter,它将这些块聚合到一个Map中。

public class MapItemWriter implements ItemWriter<SomePOJO> {

    private Map<String, SomePOJO> somePojoMap; 

    public MapItemWriter() {
        System.out.println("Writer created ");
        somePojoMap= new ConcurrentHashMap<String, SomePOJO>();
    }

    public void write(List<? extends SomePOJO> item) throws Exception {
        if (item != null && item.size() > 0) {
            for (SomePOJO data : item) {
                String uniqueId = data.Id();
                somePojoMap.put(uniqueId, data);
             }
        }
    }

    public Map<String, SomePojo> getSomePojoMap() {
        return somePojoMap;
    }
}

Since I have access to my ItemWriter bean, I can later call getSomePojoMap to get the aggregated Map of records in my file; 由于我可以访问ItemWriter Bean,因此以后可以调用getSomePojoMap来获取文件中记录的聚合Map; however, holding a Map like this in the ItemWriter doesn't feel like the best way to go about this. 但是,在ItemWriter中保存这样的Map并不是实现此目的的最佳方法。 Another concern is that the use of a ConcurrentHashMap may degrade performance but I don't see any other way in which I can aggregate the file into a Map in a thread safe manner. 另一个问题是,使用ConcurrentHashMap可能会降低性能,但是我看不到有其他方法可以以线程安全的方式将文件聚合到Map中。

Is there a better way to aggregate my file into a Map rather than holding a Map in my writer and using a ConcurrentHashMap? 有没有比将我的文件保存在编写器中并使用ConcurrentHashMap更好的方法来将文件聚合到Map中?

That's more or less it. 差不多。 You could make small improvements like putting the map in a separate bean, which would allow you to have a different lifetime for the writer bean and the map and also decouple the readers of the map from the writer. 您可以进行一些小的改进,例如将地图放在单独的Bean中,这样可以使writer Bean和地图具有不同的生存期,并使地图的读取器与writer分离。 For instance you could put the map in a job scoped bean and still have the writer a singleton, for instance. 例如,您可以将地图放入作业范围的Bean中,但仍使编写者为单例。

You only need a ConcurrentHashMap if your job is partitioned into multiple threads (I'm assuming you don't want the map shared across jobs). 如果您的作业被划分为多个线程,则仅需要ConcurrentHashMap (我假设您不希望跨作业共享地图)。

Why won't you use File Item Writer. 为什么不使用File Item Writer。

I assume that this map should be written to a file. 我认为应该将此映射写入文件。 probably a flat file ( txt) 可能是平面文件(txt)

If this is the case try to use FlatFileItemWriter . 如果是这种情况,请尝试使用FlatFileItemWriter In case you need to write this data to a xml file , you can use StaxEventItemWriter . 如果您需要将此数据写入xml文件,则可以使用StaxEventItemWriter

Even if your don't need to write the data to a file ( need only the map in the end of the batch processing). 即使您不需要将数据写入文件(批处理结束时仅需要映射)。 I think that it will be "cheaper" to write the data to a file and afterwards reading the whole map from the file. 我认为将数据写入文件,然后从文件中读取整个地图将是“便宜的”。 Saving the map within the job scope means that this object will be persist in db in every chunk, and will be retrieved from db on every chunk, quite expensive operation. 将映射保存在作业范围内意味着该对象将在每个块中都保留在db中,并且将在每个块中从db中检索到,这是非常昂贵的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Spring Batch读写zOS(Mainframe)平面文件 - How to Read/write zOS(Mainframe) flat file using spring batch Spring Batch 如何编写 Map - Spring Batch how to write a Map 从数据库读取批量数据(动态查询),并使用spring batch将其写入平面文件 - Read bulk data (Dynamic query) from Database and write to flat file using spring batch spring 批量读取多个文件并写入一个文件 - spring batch read from multiple files and write to one file 如何使用 Spring Data Cassandra 从 cassandra 数据库读取超过百万条记录并使用 Spring Batch 将其写入文件? - How to read more than million records from cassandra Database using Spring Data Cassandra and write it into a file using Spring Batch? 我应该如何使用Spring Batch编写结果文件 - How should I write a results file using Spring Batch Spring Batch如何读取多个表(查询)作为Reader并将其写为平面文件写入 - Spring Batch How to read multiple table (queries) as Reader and write it as flat file write 春季批| 读取计数=过滤器+写入? - Spring Batch | Read Count = Filter + Write? 使用Spring Batch读取以#开头的文件内容 - Read file content which start with # using Spring Batch Spring Batch:如何使用FlatFileItemReader读取CSV文件的页脚和验证 - Spring Batch : How to read footer of CSV file and validation using FlatFileItemReader
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM