简体   繁体   English

Java线程未在JDBC程序中进行

[英]Java thread not progressing in a JDBC program

I am trying to find count of rows in all tables of a database on source and destination, source being Greenplum and destination being Hive(on HDFS). 我试图在源和目标上查找数据库所有表中的行数,源是Greenplum,目标是Hive(在HDFS上)。 To do the parallel processing, I have created two threads which calls the methods that calculate the counts on both the ends independently. 为了进行并行处理,我创建了两个线程,这些线程调用分别计算两端计数的方法。 The code can be seen below: 该代码可以在下面看到:

new Thread(new Runnable() {
    @Override
    public void run() {
        try {
            gpTableCount   = getGpTableCount();
        } catch (SQLException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}).start();

new Thread(new Runnable() {
    @Override
    public void run() {
        try {
            hiveTableCount = getHiveTableCount();
        } catch (SQLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}).start();

while(!(gpTableCount != null && gpTableCount.size() > 0 && hiveTableCount != null && hiveTableCount.size() > 0)) {
    Thread.sleep(5000);
}

The results of both the threads are stored in two separate Java Hashmaps. 两个线程的结果都存储在两个单独的Java Hashmap中。 Below is the count for calculating the GP counts. 以下是用于计算GP计数的计数。 Method of calculating Hive counts is same except the database name, hence I just gave one method. 除数据库名称外,Hive计数的计算方法相同,因此我只给出了一种方法。

public Map<String,String> getGpTableCount() throws SQLException {
    Connection gpAnalyticsCon       = (Connection) DbManager.getGpConnection();
    while(keySetIterator_gpTableList.hasNext()) {
        gpTabSchemakey  = keySetIterator_gpTableList.next();
        tablesnSSNs     = gpTabSchemakey.split(",");
        target          = tablesnSSNs[1].split(":");
        analyticsTable  = target[0].split("\\.");   
        gpCountQuery    = "select '" + analyticsTable[1] + "' as TableName, count(*) as Count, source_system_name, max(xx_last_update_tms) from " + tablesnSSNs[0] + " where source_system_name = '" + target[1] + "' group by source_system_name";
        try {
            gp_pstmnt            = gpAnalyticsCon.prepareStatement(gpCountQuery);
            ResultSet gpCountRs  = gp_pstmnt.executeQuery();
            while(gpCountRs.next()) {
                gpCountRs.getLong(2) + ", Max GP Tms: " + gpCountRs.getTimestamp(4).toString());
                gpDataMap.put(gpCountRs.getString(1) + "," + gpCountRs.getString(3), gpCountRs.getLong(2) + "," + gpCountRs.getTimestamp(4).toString());
            }
        } catch(org.postgresql.util.PSQLException e) {
            e.printStackTrace();
        } catch(SQLException e) {
            e.printStackTrace();
        } catch(Exception e) {
            e.printStackTrace();
        }
    }
    System.out.println("GP Connection closed");
    gp_pstmnt.close();
    gpAnalyticsCon.close();
    return gpDataMap;
}

Hive's Method: 蜂巢的方法:

public Map<String, String> getHiveTableCount() throws IOException, SQLException {
    Connection hiveConnection       = DbManager.getHiveConnection();
    while(hiveIterator.hasNext()) {
        gpHiveRec   = hiveIterator.next();
        hiveArray   = gpHiveRec.split(",");
        hiveDetails = hiveArray[1].split(":");
        hiveTable   = hiveDetails[0].split("\\.");
        hiveQuery   = "select '" + hiveTable[1] + "' as TableName, count(*) as Count, source_system_name, max(xx_last_update_tms) from " + hiveDetails[0] + " where source_system_name='" + hiveDetails[1] + "' group by source_system_name";
        try {
            hive_pstmnt             = hiveConnection.prepareStatement(hiveQuery);
            ResultSet hiveCountRs   = hive_pstmnt.executeQuery();
            while(hiveCountRs.next()) {
                hiveDataMap.put(hiveCountRs.getString(1) + "," + hiveCountRs.getString(3), hiveCountRs.getLong(2) + "," + hiveCountRs.getTimestamp(4).toString());
            }
        } catch(HiveSQLException e) {
            e.printStackTrace();
        } catch(SQLException e) {
            e.printStackTrace();
        } catch(Exception e) {
            e.printStackTrace();
        }
    }
    return hiveDataMap;
}

When the jar is submitted, both the threads are launched and the SQL Queries for GP & Hive start executing simultaneously. 提交jar时,将启动两个线程,并且开始同时执行G​​P&Hive的SQL查询。 But the problem here is, as soon as the thread for GP finishes the execution of the method: getGpTableCount() , I see the print statement: GP Connection closed and the hive's thread hangs for atleast 30mins before resuming. 但是这里的问题是,一旦GP的线程完成方法的执行: getGpTableCount() ,我就会看到打印语句: GP Connection closed ,并且配置单元的线程在恢复之前至少挂了30分钟。 If checked for locks on Hive tables incase there would be none locked. 如果检查了Hive表上的锁,以防万一没有锁。 After 30-40mins, the hive threads starts again and finishes. 30-40分钟后,配置单元线程再次开始并结束。 This happens even for less number of tables (like 20 tables) on hive. 即使配置单元上的表数量较少(例如20个表),也会发生这种情况。

This is how I submit the jar: 这是我提交罐子的方式:

/usr/jdk64/jdk1.8.0_112/bin/java -Xdebug -Dsun.security.krb5.debug=true -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.krb5.realm=PROD.COM -Djava.security.krb5.kdc=ip-xx-xxx-xxx-xxx.ec2.internal -Djavax.security.auth.useSubjectCredsOnly=false -jar /home/etl/ReconTest/ReconAuto_Test_Prod.jar

Could anyone let me know if there is any issue with the way I create threads in the code and how can I fix it ? 任何人都可以让我知道在代码中创建线程的方式是否存在问题,以及如何解决该问题?

Assuming your gpTableCount and hiveTableCount are normal HashMap s, you're running in to synchronization issues. 假设您的gpTableCounthiveTableCount是正常的HashMap ,那么您将遇到同步问题。

This is a broad topic to fully explain here, but here's a short intro: 这是一个广泛的主题,需要在此处进行详细说明,但这是一个简短的介绍:

Since they are populated in different threads, your main thread does not 'see' these changes until the memory is synchronized. 由于它们是在不同的线程中填充的,因此在内存同步之前,您的主线程不会“看到”这些更改。 There's no guarantee when this happens (and it's best to assume it will never happen unless you force it). 无法保证何时发生这种情况(最好假定除非您强制执行,否则永远不会发生)。

To do this properly, either use threadsafe versions (see Collections.synchronizedMap or ConcurrentHashMap ), or manually synchronize your checks on the same monitor. 要正确执行此操作,请使用线程安全版本(请参阅Collections.synchronizedMapConcurrentHashMap ),或在同一监视器上手动同步检查。 (ie but the check itself in a synchronized method, and put the code that populated the map in a synchronized method, too). (即,但将自身检查在synchronized方法中,并将填充地图的代码也置于同步方法中)。 Alternatively, you could put the count itself in two volatile int s, and update those in the other two threads. 或者,您可以将count本身放入两个volatile int ,并在其他两个线程中进行更新。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM