简体   繁体   English

处理大量数据

[英]Processing large number of data

Question Goes like this. 问题是这样的。

Form one application I am getting approx 2,00,000 Encrypted values task 形成一个应用程序,我得到大约2,00,000个加密值任务

  1. Read all Encrypted values in one Vo /list 在一个Vo /列表中读取所有加密值
  2. Reformat it add header /trailers. 重新格式化后添加标题/ trailers。
  3. Dump this records to DB in one shot with header and trailer in seperated define coloums 将此记录转储到DB中,并在单独的定义列中使用标头和尾标

I don't want to use any file in between processes What would be the best way to store 2,00,000 records list or something how to dump this record at one shot in db. 我不想在进程之间使用任何文件。什么是存储2,00,000条记录列表的最佳方法,或者是如何在db中一次将其转储的方法。 is better to dived in chunks and use separate thread to work on it. 最好将其分成多个部分并使用单独的线程进行处理。 please suggest some less time consuming solution for this. 请为此提供一些耗时较少的解决方案。

I am using spring batch for this and this process will be one job. 我正在为此使用Spring Batch,此过程将是一项工作。

Spring batch is made to do this type of operation. 使用Spring批处理来执行此类操作。 You will want a chunk tasklet. 您将需要一个块任务。 This type of tasklet uses a reader, an item processor, and writer. 这种Tasklet使用阅读器,项目处理器和编写器。 Also, this type of tasklet uses streaming, so you will never have all items in memory at one time. 另外,这种类型的Tasklet使用流式传输,因此您永远不会一次将所有项目存储在内存中。

I'm not sure of the incoming format of your data, but there are existing readers for pretty much any use-case. 我不确定您的数据的传入格式,但是几乎所有用例都有现有的阅读器。 And if you can't find the type you need, you can create your own. 而且,如果找不到所需的类型,则可以创建自己的类型。 You will then want to implement ItemProcessor to handle any modifications you need to do. 然后,您将需要实现ItemProcessor来处理您需要做的任何修改。

For writing, you can just use JdbcBatchItemWriter . 对于编写,您可以只使用JdbcBatchItemWriter

As for these headers/footers, I would need more details on this. 至于这些页眉/页脚,我将需要更多详细信息。 If they are an aggregation of all the records, you will need to process them beforehand. 如果它们是所有记录的汇总,则需要事先对其进行处理。 You can put the end results into the ExecutionContext. 您可以将最终结果放入ExecutionContext中。

There are a couple of generic tricks to make bulk insertion go faster: 有两个通用技巧可以使批量插入变得更快:

  • Consider using the database's native bulk insert. 考虑使用数据库的本机批量插入。

  • Sort the records into ascending order on the primary key before you insert them. 插入记录之前,请按主键上的升序对它们进行排序。

  • If you are inserting into an empty table, drop the secondary indexes first and then recreate them. 如果要插入到空表中,请先删除二级索引,然后重新创建它们。

  • Don't do it all in one database transaction. 不要在一个数据库事务中完成所有操作。

I don't know how well these tricks translate to spring-batch ... but if they don't you could consider bypassing spring-batch and going directly to the database. 我不知道这些技巧转换成spring-batch的程度如何……但是,如果没有,您可以考虑绕过spring-batch并直接进入数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM