简体   繁体   中英

Java Spring: How to efficiently read and save large amount of data from a CSV file?

I am developing a web application in Java Spring where I want the user to be able to upload a CSV file from the front-end and then see the real-time progress of the importing process and after importing he should be able to search individual entries from the imported data.

The importing process would consist of actually uploading the file (sending it via REST API POST request) and then reading it and saving its contents to a database so the user would be able to search from this data.

What would be the fastest way to save the data to the database? Just looping over the lines and creating a new class object and saving it via JPARepository for each line takes too much time. It took around 90s for 10000 lines. I need to make it a lot faster. I need to add 200k rows in a reasonable amount of time.

Side Notes:

I saw Asynchronous approach, with Reactor. This should be faster as it uses multiple threads and the order of saving the rows basically isn't important (although the data has ID-s in the CSV).

Then I also saw Spring Batch jobs, but all of the examples use SQL. I am using repositories so I'm not sure if I can use it or whether it's the best approach.

This GitHub repo compares 5 different methods of batch inserting data. Acc. to him, using JdbcTemplate is the fastest (he claims 500000 records in 1.79 [+- 0.50] seconds). If you use JdbcTemplate with Spring Data, you'll need to create a custom repository; see this section in the docs for detailed instructions about that.

Spring Data CrudRepository has a save method that takes an Iterable , so you can use that too, although you'll have to time it to see how it performs against the JdbcTemplate . Using Spring Data, the steps are as follows (taken from here with some edit)

  1. Add: rewriteBatchedStatements=true to the end of the connection string.
  2. Make sure you use a generator that supports batching in your entity. Eg

     @Id @GeneratedValue(generator = "generator") @GenericGenerator(name = "generator", strategy = "increment") 
  3. Use the: save(Iterable<S> entities) method of the CrudRepository to save the data.

  4. Use the: hibernate.jdbc.batch_size configuration.

The code for the solution #2 is here .

As for using multiple threads, remember that writing to the same table in the database from multiple threads may produce table level contentions and produce worse results. You will have to try and time it. How to write multithreaded code using project Reactor is a completely separate topic that's out of the scope here.

HTH.

If you are using SQLServer simply create a SSiS package that looks for the file and when it shows up simply grabs it and loads it and then renames the file. That make it a one time build and a million times execute and SSIS can load a ton of data fairly fast. Rick

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM