如何使用Java从Apache Spark 2.X连接到Phoenix

Question

Surprisingly , couldn't find any up to date JAVA document in the web for this . 令人惊讶的是，为此无法在网络上找到任何最新的JAVA文档。 The 1 or 2 examples in the entire World Wild Web is too old. 整个World Wild Web中的1或2个示例太旧了。 I came up with the following which fails with error ' Module not Found org.apache.phoenix.spark ' , but That module is part of the Jar for Sure . 我想出了以下错误失败的错误“ Module not Found org.apache.phoenix.spark ”，但是该模块是Jar for Sure的一部分。 I don't think following approach is right because it is copy - paste from different examples, and loading a module like this is a bit anti pattern , as we already have the package as part of the jar. 我不认为以下方法是正确的，因为它是复制-从不同的示例粘贴，并且像这样加载模块有点反模式，因为我们已经将该包作为jar的一部分。 Please show me the right way. 请告诉我正确的方法。

Note- Please do Scala or Phython example , They are easily available over net, 注意-请以Scala或Phython为例，它们很容易通过网络获得，

public class ECLoad {
    public static void main(String[] args){
        //Create a SparkContext to initialize
        String warehouseLocation = new File("spark-warehouse").getAbsolutePath();
        SparkSession spark = SparkSession
                .builder()
                .appName("ECLoad")
                .master("local")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .getOrCreate();

        spark.conf().set("spark.testing.memory", "2147480000");         // if you face any memory issue
        Dataset<Row> df = spark.sqlContext().read().format("org.apache.phoenix.spark.*").option("table",
                "CLINICAL.ENCOUNTER_CASES").option("zkUrl", "localhost:2181").load();
        df.show();
    }

}

I'm trying to run it as 我正在尝试以

spark-submit --class "encountercases.ECLoad" --jars phoenix-spark-5.0.0-HBase-2.0.jar,phoenix-core-5.0.0-HBase-2.0.jar --master local ./PASpark-1.0-SNAPSHOT.jar

and I get following error - 我得到以下错误-

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

I see required jars are already at the suggested path and hbase-site.xml symlink exixsts. 我看到所需的jar已位于建议的路径和hbase-site.xml symlink exixsts中。

Answer 1

Before getting phoenix working with spark, you will need to setup the environment for spark so that it knows how to access phoenix/hbase. 在使phoenix与spark配合使用之前，您需要设置spark环境，以便它知道如何访问phoenix / hbase。

First create a symbolic link to hbase-site.xml ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml Alternatively you can add this file while creating spark session or in spark defaults. 首先创建指向hbase-site.xml的符号链接ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml或者，您可以在创建spark会话时或在Spark默认值。
You will need to add jars under /usr/hdp/current/phoenix-client/ to driver as well as executor class path. 您将需要将/usr/hdp/current/phoenix-client/下的jar添加到驱动程序以及执行程序类路径。 Parameters to be set: spark.driver.extraClassPath and spark.executor.extraClassPath 要设置的参数： spark.driver.extraClassPath和spark.executor.extraClassPath
This step is trivial and could easily be translated into java/scala/python/R, above 2 steps are critical for it to work as those setup env: 此步骤很简单，可以轻松地转换为java / scala / python / R，以上两个步骤对于在安装env时至关重要：

val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "CLINICAL.ENCOUNTER_CASES", "zkUrl" -> "localhost:2181"))

Refer to: https://community.hortonworks.com/articles/179762/how-to-connect-to-phoenix-tables-using-spark2.html 请参阅： https : //community.hortonworks.com/articles/179762/how-to-connect-to-phoenix-tables-using-spark2.html

如何使用Java从Apache Spark 2.X连接到Phoenix

问题描述

1 个解决方案

解决方案1
0 2019-07-30 09:35:56

如何使用Java从Apache Spark 2.X连接到Phoenix

问题描述

1 个解决方案

解决方案1 0 2019-07-30 09:35:56

解决方案1
0 2019-07-30 09:35:56