简体   繁体   English

如何使用Java从Apache Spark 2.X连接到Phoenix

[英]How to connect to Phoenix From Apache Spark 2.X Using Java

Surprisingly , couldn't find any up to date JAVA document in the web for this . 令人惊讶的是,为此无法在网络上找到任何最新的JAVA文档。 The 1 or 2 examples in the entire World Wild Web is too old. 整个World Wild Web中的1或2个示例太旧了。 I came up with the following which fails with error ' Module not Found org.apache.phoenix.spark ' , but That module is part of the Jar for Sure . 我想出了以下错误失败的错误“ Module not Found org.apache.phoenix.spark ”,但是该模块是Jar for Sure的一部分。 I don't think following approach is right because it is copy - paste from different examples, and loading a module like this is a bit anti pattern , as we already have the package as part of the jar. 我不认为以下方法是正确的,因为它是复制-从不同的示例粘贴,并且像这样加载模块有点反模式,因为我们已经将该包作为jar的一部分。 Please show me the right way. 请告诉我正确的方法。

Note- Please do Scala or Phython example , They are easily available over net, 注意-请以Scala或Phython为例,它们很容易通过网络获得,

public class ECLoad {
    public static void main(String[] args){
        //Create a SparkContext to initialize
        String warehouseLocation = new File("spark-warehouse").getAbsolutePath();
        SparkSession spark = SparkSession
                .builder()
                .appName("ECLoad")
                .master("local")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .getOrCreate();

        spark.conf().set("spark.testing.memory", "2147480000");         // if you face any memory issue
        Dataset<Row> df = spark.sqlContext().read().format("org.apache.phoenix.spark.*").option("table",
                "CLINICAL.ENCOUNTER_CASES").option("zkUrl", "localhost:2181").load();
        df.show();
    }

} 

I'm trying to run it as 我正在尝试以

spark-submit --class "encountercases.ECLoad" --jars phoenix-spark-5.0.0-HBase-2.0.jar,phoenix-core-5.0.0-HBase-2.0.jar --master local ./PASpark-1.0-SNAPSHOT.jar

and I get following error - 我得到以下错误-

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

I see required jars are already at the suggested path and hbase-site.xml symlink exixsts. 我看到所需的jar已位于建议的路径和hbase-site.xml symlink exixsts中。 在此处输入图片说明

Before getting phoenix working with spark, you will need to setup the environment for spark so that it knows how to access phoenix/hbase. 在使phoenix与spark配合使用之前,您需要设置spark环境,以便它知道如何访问phoenix / hbase。

  1. First create a symbolic link to hbase-site.xml ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml Alternatively you can add this file while creating spark session or in spark defaults. 首先创建指向hbase-site.xml的符号链接ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml或者,您可以在创建spark会话时或在Spark默认值。

  2. You will need to add jars under /usr/hdp/current/phoenix-client/ to driver as well as executor class path. 您将需要将/usr/hdp/current/phoenix-client/下的jar添加到驱动程序以及执行程序类路径。 Parameters to be set: spark.driver.extraClassPath and spark.executor.extraClassPath 要设置的参数: spark.driver.extraClassPathspark.executor.extraClassPath

  3. This step is trivial and could easily be translated into java/scala/python/R, above 2 steps are critical for it to work as those setup env: 此步骤很简单,可以轻松地转换为java / scala / python / R,以上两个步骤对于在安装env时至关重要:

val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "CLINICAL.ENCOUNTER_CASES", "zkUrl" -> "localhost:2181"))

Refer to: https://community.hortonworks.com/articles/179762/how-to-connect-to-phoenix-tables-using-spark2.html 请参阅: https : //community.hortonworks.com/articles/179762/how-to-connect-to-phoenix-tables-using-spark2.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将Spark 2.X连接到ElasticSearch 2.X - Connect Spark 2.X to ElasticSearch 2.X Apache Spark 在 Java 中读取和写入 Apache Phoenix 的方法 - Apache Spark ways to Read and Write From Apache Phoenix in Java 阅读时使用Apache Spark 2.x洗牌 - Apache Spark 2.x shuffle parquet on reading 使用spark 2.x连接到Elasticsearch 2.4.4 - Connect to elasticsearch 2.4.4 with spark 2.x Spark 2.x + Tika:java.lang.NoSuchMethodError:org.apache.commons.compress.archivers.ArchiveStreamFactory.detect - Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect 如何将sql_context.registerDataFrameAsTable从spark 1.x迁移到spark 2.x - How to migrate sql_context.registerDataFrameAsTable from spark 1.x to spark 2.x 将 Scala 2.12 与 Spark 2.x 结合使用 - Using Scala 2.12 with Spark 2.x java.lang.AbstractMethodError:org.apache.phoenix.spark.DefaultSource.createRelation 使用 pheonix 在 Z77BB59DCD89559748E5DB56956C10601 - java.lang.AbstractMethodError:org.apache.phoenix.spark.DefaultSource.createRelation using pheonix in pyspark Spark 2.x 使用 HiveThriftServer2 和 sqlContext - Spark 2.x using HiveThriftServer2 with sqlContext 如何从SparkContext将Apache Spark与Yarn连接起来? - How to connect Apache Spark with Yarn from the SparkContext?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM