简体   繁体   English

Spark在具有2个工作程序的群集上的JdbcRDD中引发NullPointerException

[英]Spark throws NullPointerException in JdbcRDD on a Cluster with 2 workers

I am running spark cluster with 2 Workers, each with 60GB. 我正在与2个工作人员(每个工作人员拥有60 GB)一起运行Spark Cluster。

I have written below code for JdbcRDD. 我已经为JdbcRDD编写了以下代码。

String sql   = "SELECT * FROM( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS Row,"+ 
               " * FROM [Table_1]) A WHERE Row >= ? AND Row <= ? ";

    SparkContext sctx   = new SparkContext(getSparkConf());
    try {
        JdbcRDD<List> jdbcRdd = new JdbcRDD(sctx,new GetJDBCConnection(),sql,0, rowCount, 200, new GetJDBCResult(),scala.reflect.ClassTag$.MODULE$.AnyRef());

        Object[] bb = (Object[])jdbcRdd.collect();

        System.out.println("Length of Object array : "+bb.length);
        System.out.println("JdbcRDD:- "+bb);
    } catch(Exception e) {
        e.printStackTrace();
    }

and code for GetJdbcResult is 和GetJdbcResult的代码是

class GetJDBCResult extends AbstractFunction1<ResultSet, List> implements Serializable{

private static final long serialVersionUID = -78825308090L;
 public List apply(ResultSet rs) {
    Object result = null;

    List lst = new ArrayList();
    try {
        System.out.println("In apply method");
        System.out.println("resultSet : -"+rs);
        int cols  = rs.getMetaData().getColumnCount();
        System.out.println("no of columns : "+cols);
        for(int i=1;i<=cols;i++) {
            result = rs.getObject(i);

            System.out.println("Object : -"+result);
            lst.add(result);
        }
        System.out.println("result->" + lst);
    } catch (SQLException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return lst;         
}

} }

Above code runs fine I run Spark on standalone mode (local*) but If use cluster environment, then it throws below error : 上面的代码运行良好,我在独立模式(local *)上运行Spark,但是如果使用群集环境,则抛出以下错误:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 39, DD1AI7511): java.lang.NullPointerException: 
    org.apache.spark.rdd.JdbcRDD$$anon$1.<init>(JdbcRDD.scala:74)
    org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:70)
    org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:50)
    org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
    org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    org.apache.spark.scheduler.Task.run(Task.scala:54)
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    java.lang.Thread.run(Thread.java:722)

Driver stacktrace: 驱动程序堆栈跟踪:

There are no traces/logs on Worker logs. 工作者日志上没有任何跟踪/日志。 Am I doing something wrong here ? 我在这里做错什么了吗? Anybody has any idea ? 有人有什么主意吗?

If you look at line 74 in JdbcRDD, it is very clear that this is due to database connection being null. 如果您查看JdbcRDD中的第74行,则很显然这是由于数据库连接为空。

https://github.com/apache/spark/blob/655699f8b7156e8216431393436368e80626cdb2/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala https://github.com/apache/spark/blob/655699f8b7156e8216431393436368e80626cdb2/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala

Since it is working in local but not in cluster, check whether your cluster instances have access to database. 由于它在本地而不是群集中运行,因此请检查您的群集实例是否有权访问数据库。 If this is in EC2, please make sure your firewalls rules are correct. 如果在EC2中,请确保您的防火墙规则正确。 Also make sure spark cluster is running in the same VPC as your database. 另外,请确保Spark集群与数据库在同一VPC中运行。 Best way to verify this is to SSH to one of your spark slave node and see whether you can connect to database remotely from there. 验证这一点的最佳方法是SSH到您的spark从节点之一,并查看是否可以从那里远程连接到数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM