简体   繁体   English

Java Spring:如何有效地从CSV文件读取和保存大量数据?

[英]Java Spring: How to efficiently read and save large amount of data from a CSV file?

I am developing a web application in Java Spring where I want the user to be able to upload a CSV file from the front-end and then see the real-time progress of the importing process and after importing he should be able to search individual entries from the imported data. 我正在Java Spring中开发一个Web应用程序,我希望用户能够从前端上载CSV文件,然后查看导入过程的实时进度,并且在导入之后,他应该能够搜索单个条目从导入的数据。

The importing process would consist of actually uploading the file (sending it via REST API POST request) and then reading it and saving its contents to a database so the user would be able to search from this data. 导入过程将包括实际上传文件(通过REST API POST请求发送),然后读取文件并将其内容保存到数据库中,以便用户能够从该数据中进行搜索。

What would be the fastest way to save the data to the database? 将数据保存到数据库的最快方法是什么? Just looping over the lines and creating a new class object and saving it via JPARepository for each line takes too much time. 仅遍历各行并创建一个新的类对象,然后通过JPARepository为每行保存它会花费太多时间。 It took around 90s for 10000 lines. 大约10,000条线路花了90年代。 I need to make it a lot faster. 我需要加快速度。 I need to add 200k rows in a reasonable amount of time. 我需要在合理的时间内添加200k行。

Side Notes: 注意事项:

I saw Asynchronous approach, with Reactor. 我看到了使用Reactor的异步方法。 This should be faster as it uses multiple threads and the order of saving the rows basically isn't important (although the data has ID-s in the CSV). 这应该更快,因为它使用多个线程,并且保存行的顺序基本上并不重要(尽管数据在CSV中具有ID)。

Then I also saw Spring Batch jobs, but all of the examples use SQL. 然后,我还看到了Spring Batch作业,但是所有示例都使用SQL。 I am using repositories so I'm not sure if I can use it or whether it's the best approach. 我正在使用存储库,因此不确定是否可以使用它,或者它是否是最佳方法。

This GitHub repo compares 5 different methods of batch inserting data. GitHub存储库比较了5种不同的批量插入数据的方法。 Acc. 累积 to him, using JdbcTemplate is the fastest (he claims 500000 records in 1.79 [+- 0.50] seconds). 对他来说,使用JdbcTemplate最快(他声称在1.79 [+ -0.50]秒内记录了500000条记录)。 If you use JdbcTemplate with Spring Data, you'll need to create a custom repository; 如果将JdbcTemplate与Spring Data结合使用,则需要创建一个自定义存储库; see this section in the docs for detailed instructions about that. 有关详细说明,请参阅文档中的部分。

Spring Data CrudRepository has a save method that takes an Iterable , so you can use that too, although you'll have to time it to see how it performs against the JdbcTemplate . Spring Data CrudRepository有一个采用Iterablesave方法,因此您也可以使用它,尽管您必须定时查看它如何针对JdbcTemplate执行。 Using Spring Data, the steps are as follows (taken from here with some edit) 使用Spring Data,步骤如下(从此处进行一些编辑)

  1. Add: rewriteBatchedStatements=true to the end of the connection string. 添加: rewriteBatchedStatements=true至连接字符串的末尾。
  2. Make sure you use a generator that supports batching in your entity. 确保您使用的生成器支持实体中的批处理。 Eg 例如

     @Id @GeneratedValue(generator = "generator") @GenericGenerator(name = "generator", strategy = "increment") 
  3. Use the: save(Iterable<S> entities) method of the CrudRepository to save the data. 使用CrudRepository save(Iterable<S> entities)方法保存数据。

  4. Use the: hibernate.jdbc.batch_size configuration. 使用: hibernate.jdbc.batch_size配置。

The code for the solution #2 is here . 解决方案2的代码在此处

As for using multiple threads, remember that writing to the same table in the database from multiple threads may produce table level contentions and produce worse results. 至于使用多个线程,请记住,从多个线程向数据库中的同一表写入可能会产生表级争用并产生更糟的结果。 You will have to try and time it. 您将不得不尝试计时。 How to write multithreaded code using project Reactor is a completely separate topic that's out of the scope here. 如何使用项目Reactor编写多线程代码是一个完全独立的主题,不在本文讨论范围之内。

HTH. HTH。

If you are using SQLServer simply create a SSiS package that looks for the file and when it shows up simply grabs it and loads it and then renames the file. 如果您使用的是SQLServer,则只需创建一个SSiS软件包即可查找该文件,当文件显示时,只需抓取并加载它,然后重命名该文件即可。 That make it a one time build and a million times execute and SSIS can load a ton of data fairly fast. 这样一来,一次构建就可以执行一百万次,SSIS可以相当快地加载大量数据。 Rick 里克

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM