简体   繁体   English

如何以超快的速度插入 100,000 个父行,每行 200 个子行?

[英]How to insert 100,000 parent rows each with 200 child rows super fast?

I have a parent entity called OrderEvent and child entity called PreCondition.我有一个名为 OrderEvent 的父实体和一个名为 PreCondition 的子实体。 One OrderEvent can have many PreConditions(>= 200).一个 OrderEvent 可以有多个 PreConditions(>=200)。 I need to save 100000 OrderEvent + 100000 * 200 PreCondition.我需要保存 100000 OrderEvent + 100000 * 200 PreCondition。 I used Repository.save(list Of OrderEvents) and save into DB for every 1000 records.我使用 Repository.save(list Of OrderEvents) 并将每 1000 条记录保存到数据库中。 It takes approx 30secs to insert 1000 OrderEvents.插入 1000 个 OrderEvent 大约需要 30 秒。

It takes almost an hour to save all 100000 OrderEvents.保存所有 100000 个 OrderEvent 需要将近一个小时。

Is there any way to bring down below 2 mins?有什么办法可以把时间降到2分钟以下?

Tried save entities method of repository尝试保存存储库的实体方法

    public  void parseOrder(String path, String collectionName) throws ParseException {
        BufferedReader reader;
        Connection conn = (Connection) em.unwrap(java.sql.Connection.class);
        System.out.println(conn);
        try {
            reader = new BufferedReader(new FileReader(
                    path));
            String line = reader.readLine();

            String jobNumber =  line.substring(0, 7).trim();
            String recordType =  line.substring(7, 9).trim();
            Integer len = line.length();
            preId = 0L;
            postId = 0L;
            eventId = 0L;

            OrderEvent orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);
            Integer count = 1;
            Integer batch = 0;
            long startTime = System.nanoTime();

            List<OrderEvent> list = new ArrayList<OrderEvent>();
            while (line != null) {
                line = reader.readLine();
                if (line == null) {
                    continue;
                }
                jobNumber =  line.substring(0, 7).trim();
                recordType =  line.substring(7, 9).trim();
                len = line.length();

                if (recordType.equals("0H")) { 

                    count++;
                    batch++;
                    if (batch.equals(1000)) {
                        orderRepository.save(list);
                        list.clear();
                        long estimatedTime = System.nanoTime() - startTime;
                        System.out.println("Processed " +  batch + " records in " +  estimatedTime / 1_000_000_000.  +  " second(s).");

                        batch = 0;
                        startTime = System.nanoTime();
                    }


                    list.add(orderEvent);
                    //orderRepository.saveAndFlush(orderEvent);
                    orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);

                } else if (recordType.equals("2F")) { 
                    this.paraseFeature(line,len,jobNumber,orderEvent);
                }
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private  OrderEvent paraseHeader (String line,Integer len,String jobNumber,String collectionName) throws ParseException {

            String model = line.substring(9, 16).trim();
            String processDate =  line.substring(len-11,len-3).trim();
            String formattedProcessDate =  processDate.substring(0,4) + "-" + 
                    processDate.substring(4,6) +"-" + processDate.substring(6,8) + " 00:00:00";

            //eventId++;

            OrderEvent orderEvent = new OrderEvent(jobNumber,UUID.randomUUID().toString(),collectionName,
                    formatter.parse(formattedProcessDate));

        //  preId++;
            //postId++;
            orderEvent.fillPrecondition("Model", "Stimulus", "OP_EQ", model);
            orderEvent.fillPostcondition("Add_Fact","Coded","Response","True");


            return orderEvent;
    }
    private  void paraseFeature (String line,Integer len, String jobNumber, OrderEvent orderEvent) {

    //  preId++;
        String feature = line.substring(len-7,len).trim();
        orderEvent.fillPrecondition("Feature", "Stimulus", "OP_EQ", feature);
    }

This usually depends on the database setup eg what is the latency to the client, what are the indexes on the tables, how queries are locking the table and so on.这通常取决于数据库设置,例如客户端的延迟是多少,表上的索引是什么,查询如何锁定表等等。

Make sure that you understand how much time is spent in network operations.确保您了解在网络操作上花费了多少时间。 It could be the limiting factor, especially if your database sits on the other side of the world.这可能是限制因素,特别是如果您的数据库位于世界的另一端。

First establish what is the latency between the client and the database server.首先确定客户端和数据库服务器之间的延迟是多少。 If it's 10 ms than inserting this row by row would be: 100,000 * 200 * 10ms = 200000s ~ 56h.如果是 10 毫秒,那么逐行插入将是:100,000 * 200 * 10ms = 200000s ~ 56h。 This is very slow so make sure you are using batch inserts with JDBC.这非常慢,因此请确保您使用 JDBC 批量插入。

Sometimes the insertion process can be significantly speed up by creating a shadow table:有时,通过创建影子表可以显着加快插入过程:

  1. Create new tables that are identical to OrderEvents and PreCondition tables.创建与OrderEventsPreCondition表相同的新表。 Some RDBMS allow for CREATE TABLE ... AS SELECT ... FROM ... syntax.一些 RDBMS 允许CREATE TABLE ... AS SELECT ... FROM ...语法。
  2. Disable foreign keys and indexes on the shadow tables.禁用影子表上的外键和索引。
  3. Bulk insert all the data.批量插入所有数据。
  4. Enable foreign keys and indexes on shadow tables.在影子表上启用外键和索引。 This will hopefully ensure that imported data was correct.这有望确保导入的数据是正确的。
  5. Insert from shadow tables into the actual tables eg by running INSERT INTO ... SELECT ... FROM ... .从影子表插入到实际表中,例如通过运行INSERT INTO ... SELECT ... FROM ...
  6. Delete shadow table.删除影子表。

However the best option would be to skip JDBC and switch to bulk load utility provided by your database eg Oracle DB has External Tables and SQL*Loader .然而,最好的选择是跳过 JDBC 并切换到您的数据库提供的批量加载实用程序,例如 Oracle DB 具有外部表SQL*Loader These tools are specifically designed to ingest large quantities of data efficiently while JDBC is a general purpose interface.这些工具专门设计用于有效地摄取大量数据,而 JDBC 是一个通用接口。

In c# I can use SqlBulkCopy for this type of tasks.在 C# 中,我可以将SqlBulkCopy用于此类任务。

Maybe in java there is an equivalent API.. Something like this: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy也许在 java 中有一个等效的 API .. 像这样的东西: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy

Something like that better to do using DB server BULK-processing operation.使用数据库服务器批量处理操作更好地做类似的事情。 Yes, it totally different process, but it will takes seconds.是的,这是完全不同的过程,但需要几秒钟。 not even minutes.甚至没有分钟。

Unfortunatelly HOWTO is very depended on SQL-Server不幸的是 HOWTO 非常依赖于 SQL-Server

MS SQL: BULK INSERT: https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017 MS SQL:批量插入: https ://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view = sql-server-2017

PostgreSQL: COPY: https://www.postgresql.org/docs/current/sql-copy.html PostgreSQL:复制: https : //www.postgresql.org/docs/current/sql-copy.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM