简体   繁体   中英

How to do select and insert million records through java program faster

I am trying to select around 1 million records from redshift table and then need to insert back those to redshift table (after doing some manipulations)

But, that is taking very long. I waited around 1 hour for the program to terminate but no luck. Console also does not seem to print the print statements but seems stuck after printing few statements.

Tried the same for 100 records, it works fine and takes around 2 minutes.

This is part of my code:

        conn.setAutoCommit(false);
        stmt = conn.createStatement();
        stmt.setFetchSize(100);
        ResultSet rsSelect = stmt.executeQuery("select * from table");
        System.out.println("select done !");

        String queryInsert = "insert into table"
                +"(event_id,domain_userid,collector_tstamp,se_category,se_action,se_label,se_property)"
                +"values(?,?,?,?,?,?,?)";

        PreparedStatement preparedStatement = conn.prepareStatement(queryInsert);
        final int batchSize = 10000;
        int count = 0;
        System.out.println("about to go into loop !");


        while(rsSelect.next()){

            String event_id = rsSelect.getString("event_id");
            String domain_userid = rsSelect.getString("domain_userid");
            Timestamp collector_tstamp = rsSelect.getTimestamp("collector_tstamp");
            String se_category = rsSelect.getString("se_category");
            String se_action = rsSelect.getString("se_action");
            String se_label = rsSelect.getString("se_label");
            String se_property = rsSelect.getString("se_property");

            //some manipulations

            preparedStatement.setString(1, event_id);
            preparedStatement.setString(2, domain_userid);
            preparedStatement.setTimestamp(3, collector_tstamp);
            preparedStatement.setString(4, se_category);
            preparedStatement.setString(5, se_action);
            preparedStatement.setString(6, se_label);                        
            preparedStatement.setString(7, se_property);
            preparedStatement.addBatch(); 

            if(++count % batchSize == 0){
                preparedStatement.executeBatch();
                System.out.println("batch execution!");

            }               
        }
        System.out.println("out of loop");
        preparedStatement.executeBatch();
        preparedStatement.close();
        conn.commit();
        conn.close();   

I have had the same problem, taking too long to insert data from one redshift table to another redshift table (I used node.js). Initially it took me around 18 mins to insert 1 million records.

I figured the data in my table was not sorted according to the sort key(timestamp). It is imperative that your data is sorted according to sort key and use that sort key in your where predicate (If you have a where predicate).
Run vacuum table to 100 percent
to sort your data. After your manipulations make sure you order by your data according to your sort key.

After doing this, I was able to achieve unexpected results. 1 million record inserts in 3 secs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM