简体   繁体   中英

Java thread not progressing in a JDBC program

I am trying to find count of rows in all tables of a database on source and destination, source being Greenplum and destination being Hive(on HDFS). To do the parallel processing, I have created two threads which calls the methods that calculate the counts on both the ends independently. The code can be seen below:

new Thread(new Runnable() {
    @Override
    public void run() {
        try {
            gpTableCount   = getGpTableCount();
        } catch (SQLException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}).start();

new Thread(new Runnable() {
    @Override
    public void run() {
        try {
            hiveTableCount = getHiveTableCount();
        } catch (SQLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}).start();

while(!(gpTableCount != null && gpTableCount.size() > 0 && hiveTableCount != null && hiveTableCount.size() > 0)) {
    Thread.sleep(5000);
}

The results of both the threads are stored in two separate Java Hashmaps. Below is the count for calculating the GP counts. Method of calculating Hive counts is same except the database name, hence I just gave one method.

public Map<String,String> getGpTableCount() throws SQLException {
    Connection gpAnalyticsCon       = (Connection) DbManager.getGpConnection();
    while(keySetIterator_gpTableList.hasNext()) {
        gpTabSchemakey  = keySetIterator_gpTableList.next();
        tablesnSSNs     = gpTabSchemakey.split(",");
        target          = tablesnSSNs[1].split(":");
        analyticsTable  = target[0].split("\\.");   
        gpCountQuery    = "select '" + analyticsTable[1] + "' as TableName, count(*) as Count, source_system_name, max(xx_last_update_tms) from " + tablesnSSNs[0] + " where source_system_name = '" + target[1] + "' group by source_system_name";
        try {
            gp_pstmnt            = gpAnalyticsCon.prepareStatement(gpCountQuery);
            ResultSet gpCountRs  = gp_pstmnt.executeQuery();
            while(gpCountRs.next()) {
                gpCountRs.getLong(2) + ", Max GP Tms: " + gpCountRs.getTimestamp(4).toString());
                gpDataMap.put(gpCountRs.getString(1) + "," + gpCountRs.getString(3), gpCountRs.getLong(2) + "," + gpCountRs.getTimestamp(4).toString());
            }
        } catch(org.postgresql.util.PSQLException e) {
            e.printStackTrace();
        } catch(SQLException e) {
            e.printStackTrace();
        } catch(Exception e) {
            e.printStackTrace();
        }
    }
    System.out.println("GP Connection closed");
    gp_pstmnt.close();
    gpAnalyticsCon.close();
    return gpDataMap;
}

Hive's Method:

public Map<String, String> getHiveTableCount() throws IOException, SQLException {
    Connection hiveConnection       = DbManager.getHiveConnection();
    while(hiveIterator.hasNext()) {
        gpHiveRec   = hiveIterator.next();
        hiveArray   = gpHiveRec.split(",");
        hiveDetails = hiveArray[1].split(":");
        hiveTable   = hiveDetails[0].split("\\.");
        hiveQuery   = "select '" + hiveTable[1] + "' as TableName, count(*) as Count, source_system_name, max(xx_last_update_tms) from " + hiveDetails[0] + " where source_system_name='" + hiveDetails[1] + "' group by source_system_name";
        try {
            hive_pstmnt             = hiveConnection.prepareStatement(hiveQuery);
            ResultSet hiveCountRs   = hive_pstmnt.executeQuery();
            while(hiveCountRs.next()) {
                hiveDataMap.put(hiveCountRs.getString(1) + "," + hiveCountRs.getString(3), hiveCountRs.getLong(2) + "," + hiveCountRs.getTimestamp(4).toString());
            }
        } catch(HiveSQLException e) {
            e.printStackTrace();
        } catch(SQLException e) {
            e.printStackTrace();
        } catch(Exception e) {
            e.printStackTrace();
        }
    }
    return hiveDataMap;
}

When the jar is submitted, both the threads are launched and the SQL Queries for GP & Hive start executing simultaneously. But the problem here is, as soon as the thread for GP finishes the execution of the method: getGpTableCount() , I see the print statement: GP Connection closed and the hive's thread hangs for atleast 30mins before resuming. If checked for locks on Hive tables incase there would be none locked. After 30-40mins, the hive threads starts again and finishes. This happens even for less number of tables (like 20 tables) on hive.

This is how I submit the jar:

/usr/jdk64/jdk1.8.0_112/bin/java -Xdebug -Dsun.security.krb5.debug=true -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.krb5.realm=PROD.COM -Djava.security.krb5.kdc=ip-xx-xxx-xxx-xxx.ec2.internal -Djavax.security.auth.useSubjectCredsOnly=false -jar /home/etl/ReconTest/ReconAuto_Test_Prod.jar

Could anyone let me know if there is any issue with the way I create threads in the code and how can I fix it ?

Assuming your gpTableCount and hiveTableCount are normal HashMap s, you're running in to synchronization issues.

This is a broad topic to fully explain here, but here's a short intro:

Since they are populated in different threads, your main thread does not 'see' these changes until the memory is synchronized. There's no guarantee when this happens (and it's best to assume it will never happen unless you force it).

To do this properly, either use threadsafe versions (see Collections.synchronizedMap or ConcurrentHashMap ), or manually synchronize your checks on the same monitor. (ie but the check itself in a synchronized method, and put the code that populated the map in a synchronized method, too). Alternatively, you could put the count itself in two volatile int s, and update those in the other two threads.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM