简体   繁体   中英

How to insert 100,000 parent rows each with 200 child rows super fast?

I have a parent entity called OrderEvent and child entity called PreCondition. One OrderEvent can have many PreConditions(>= 200). I need to save 100000 OrderEvent + 100000 * 200 PreCondition. I used Repository.save(list Of OrderEvents) and save into DB for every 1000 records. It takes approx 30secs to insert 1000 OrderEvents.

It takes almost an hour to save all 100000 OrderEvents.

Is there any way to bring down below 2 mins?

Tried save entities method of repository

    public  void parseOrder(String path, String collectionName) throws ParseException {
        BufferedReader reader;
        Connection conn = (Connection) em.unwrap(java.sql.Connection.class);
        System.out.println(conn);
        try {
            reader = new BufferedReader(new FileReader(
                    path));
            String line = reader.readLine();

            String jobNumber =  line.substring(0, 7).trim();
            String recordType =  line.substring(7, 9).trim();
            Integer len = line.length();
            preId = 0L;
            postId = 0L;
            eventId = 0L;

            OrderEvent orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);
            Integer count = 1;
            Integer batch = 0;
            long startTime = System.nanoTime();

            List<OrderEvent> list = new ArrayList<OrderEvent>();
            while (line != null) {
                line = reader.readLine();
                if (line == null) {
                    continue;
                }
                jobNumber =  line.substring(0, 7).trim();
                recordType =  line.substring(7, 9).trim();
                len = line.length();

                if (recordType.equals("0H")) { 

                    count++;
                    batch++;
                    if (batch.equals(1000)) {
                        orderRepository.save(list);
                        list.clear();
                        long estimatedTime = System.nanoTime() - startTime;
                        System.out.println("Processed " +  batch + " records in " +  estimatedTime / 1_000_000_000.  +  " second(s).");

                        batch = 0;
                        startTime = System.nanoTime();
                    }


                    list.add(orderEvent);
                    //orderRepository.saveAndFlush(orderEvent);
                    orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);

                } else if (recordType.equals("2F")) { 
                    this.paraseFeature(line,len,jobNumber,orderEvent);
                }
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private  OrderEvent paraseHeader (String line,Integer len,String jobNumber,String collectionName) throws ParseException {

            String model = line.substring(9, 16).trim();
            String processDate =  line.substring(len-11,len-3).trim();
            String formattedProcessDate =  processDate.substring(0,4) + "-" + 
                    processDate.substring(4,6) +"-" + processDate.substring(6,8) + " 00:00:00";

            //eventId++;

            OrderEvent orderEvent = new OrderEvent(jobNumber,UUID.randomUUID().toString(),collectionName,
                    formatter.parse(formattedProcessDate));

        //  preId++;
            //postId++;
            orderEvent.fillPrecondition("Model", "Stimulus", "OP_EQ", model);
            orderEvent.fillPostcondition("Add_Fact","Coded","Response","True");


            return orderEvent;
    }
    private  void paraseFeature (String line,Integer len, String jobNumber, OrderEvent orderEvent) {

    //  preId++;
        String feature = line.substring(len-7,len).trim();
        orderEvent.fillPrecondition("Feature", "Stimulus", "OP_EQ", feature);
    }

This usually depends on the database setup eg what is the latency to the client, what are the indexes on the tables, how queries are locking the table and so on.

Make sure that you understand how much time is spent in network operations. It could be the limiting factor, especially if your database sits on the other side of the world.

First establish what is the latency between the client and the database server. If it's 10 ms than inserting this row by row would be: 100,000 * 200 * 10ms = 200000s ~ 56h. This is very slow so make sure you are using batch inserts with JDBC.

Sometimes the insertion process can be significantly speed up by creating a shadow table:

  1. Create new tables that are identical to OrderEvents and PreCondition tables. Some RDBMS allow for CREATE TABLE ... AS SELECT ... FROM ... syntax.
  2. Disable foreign keys and indexes on the shadow tables.
  3. Bulk insert all the data.
  4. Enable foreign keys and indexes on shadow tables. This will hopefully ensure that imported data was correct.
  5. Insert from shadow tables into the actual tables eg by running INSERT INTO ... SELECT ... FROM ... .
  6. Delete shadow table.

However the best option would be to skip JDBC and switch to bulk load utility provided by your database eg Oracle DB has External Tables and SQL*Loader . These tools are specifically designed to ingest large quantities of data efficiently while JDBC is a general purpose interface.

In c# I can use SqlBulkCopy for this type of tasks.

Maybe in java there is an equivalent API.. Something like this: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy

Something like that better to do using DB server BULK-processing operation. Yes, it totally different process, but it will takes seconds. not even minutes.

Unfortunatelly HOWTO is very depended on SQL-Server

MS SQL: BULK INSERT: https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

PostgreSQL: COPY: https://www.postgresql.org/docs/current/sql-copy.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM