如何通過Java程序更快地選擇和插入百萬條記錄

Question

我正在嘗試從redshift表中選擇大約一百萬條記錄，然后需要將它們重新插入到redshift表中（經過一些操作）

但是，這花費了很長時間。 我等待了大約1個小時，程序才終止，但是沒有運氣。 控制台似乎也不會打印print statements但是在打印少量語句后似乎卡住了。

嘗試相同的100條記錄，可以正常工作，大約需要2分鍾。

這是我的代碼的一部分：

        conn.setAutoCommit(false);
        stmt = conn.createStatement();
        stmt.setFetchSize(100);
        ResultSet rsSelect = stmt.executeQuery("select * from table");
        System.out.println("select done !");

        String queryInsert = "insert into table"
                +"(event_id,domain_userid,collector_tstamp,se_category,se_action,se_label,se_property)"
                +"values(?,?,?,?,?,?,?)";

        PreparedStatement preparedStatement = conn.prepareStatement(queryInsert);
        final int batchSize = 10000;
        int count = 0;
        System.out.println("about to go into loop !");


        while(rsSelect.next()){

            String event_id = rsSelect.getString("event_id");
            String domain_userid = rsSelect.getString("domain_userid");
            Timestamp collector_tstamp = rsSelect.getTimestamp("collector_tstamp");
            String se_category = rsSelect.getString("se_category");
            String se_action = rsSelect.getString("se_action");
            String se_label = rsSelect.getString("se_label");
            String se_property = rsSelect.getString("se_property");

            //some manipulations

            preparedStatement.setString(1, event_id);
            preparedStatement.setString(2, domain_userid);
            preparedStatement.setTimestamp(3, collector_tstamp);
            preparedStatement.setString(4, se_category);
            preparedStatement.setString(5, se_action);
            preparedStatement.setString(6, se_label);                        
            preparedStatement.setString(7, se_property);
            preparedStatement.addBatch(); 

            if(++count % batchSize == 0){
                preparedStatement.executeBatch();
                System.out.println("batch execution!");

            }               
        }
        System.out.println("out of loop");
        preparedStatement.executeBatch();
        preparedStatement.close();
        conn.commit();
        conn.close();

Answer 1

我遇到了同樣的問題，花費太長時間將數據從一個redshift表插入到另一個redshift表中（我使用了node.js）。 最初，我花了大約18分鍾才能插入100萬條記錄。

我發現表中的數據沒有根據排序鍵（時間戳）進行排序。 必須根據排序鍵對數據進行排序，並在where謂詞中使用該sort鍵（如果您有where謂詞）。
Run vacuum table to 100 percent
對數據進行排序。 完成操作后，請確保您根據排序鍵對數據進行排序。

完成此操作后，我獲得了意想不到的結果。 3秒鍾內插入1百萬條記錄。

如何通過Java程序更快地選擇和插入百萬條記錄

問題描述

1 個解決方案

解決方案1
0 2016-09-26 14:04:00

如何通過Java程序更快地選擇和插入百萬條記錄

問題描述

1 個解決方案

解決方案1 0 2016-09-26 14:04:00

解決方案1
0 2016-09-26 14:04:00