简体   繁体   中英

What is faster for saving in spring repository?

I need to save List with size about 2,500,000. What is faster?

repository.saveAll(list);

or

list.parallelStream().foreach(e -> repository.save(e));

The saveAll() method will be faster. saveAll also iterates over the list and calls save method so we may think that performance must be similar. But default propogation type for @Transactional is REQUIRED , so if not provided a new transaction is created each time. In saveAll only one trasaction but in save it will be equal to size of list. Hence the performance gap between the two.

Inpired by your question I did a little experiment. It puts 10M records in 10 seconds on my machine. You can use the code if you want.

class Person {
  public int number;
  public String name;
}

class Cannon extends Thread {
  private Collection<Person> people;
  private int COUNT = 5000;
  public Cannon(Collection<Person> input) {
    people = input;
  }
  public void run() {
    try {
      var db = DriverManager.getConnection("jdbc:postgresql://localhost/postgres", "postgres", "test");
      var builder = new StringBuilder();
      builder.append("insert into person (name, number) values (?,?)");
      for (int i = 1; i < COUNT; i++) {
        builder.append(",(?,?)");
      }
      var s = db.prepareStatement(builder.toString());
      int i = 0;
      for (Person p : people) {
        s.setString(2 * i + 1, p.name);
        s.setInt(2 * i + 2, p.number);
        i++;
        if (i == COUNT) {
          s.executeUpdate();
          i = 0;
        }
      }
    } catch (SQLException e) {
      e.printStackTrace();
    }
  }
}

public class Main {
  
  static public void main(String[] args) throws InterruptedException {
    var data = new ArrayList<Person>();
    var r = new Random();
    for (int i = 0; i < 10000000; i++) {
      var p = new Person();
      p.name = String.valueOf(r.nextInt());
      p.number = r.nextInt();
      data.add(p);
    }
    
    var threads = new Thread[4];
    for (int i = 0; i < threads.length; i++) {
      var chunk = new ArrayList<Person>();
      for (int j = 0; j < data.size() / threads.length; j++) {
        int index = (data.size() / threads.length) * i + j;
        chunk.add(data.get(index));
      }
      threads[i] = new Cannon(chunk);
    }

    long start = System.currentTimeMillis();

    for (int i = 0; i < threads.length; i++) threads[i].start();
    for (int i = 0; i < threads.length; i++) threads[i].join();

    System.out.println(System.currentTimeMillis() - start);
  }

}

Each call to a Spring repository usually ends up in one call the database. Even if the database is running on the same host, its still an expensive operation. (relatively to calling a function within the JVM).

So calling repository.saveAll(list) will end up in a single call to the database while list.parallelStream().foreach(e -> repository.save(e)) will end up in a database call for every single item in your list.

Even if your database can handle multiple request the same time, it is till limited to the resources of the host and potentially limited by the syncronisation mechanisms of the DBMS. I other words, the database can also handle only so many request a time.

Additionally you dont want to do save 2.5 Mio entries in a single call. This might end up in OutOfMemoryExceptions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM