[英]How to connect to Phoenix From Apache Spark 2.X Using Java
Surprisingly , couldn't find any up to date JAVA
document in the web for this . 令人惊讶的是,为此无法在网络上找到任何最新的JAVA
文档。 The 1 or 2 examples in the entire World Wild Web is too old. 整个World Wild Web中的1或2个示例太旧了。 I came up with the following which fails with error ' Module not Found org.apache.phoenix.spark
' , but That module is part of the Jar for Sure . 我想出了以下错误失败的错误“ Module not Found org.apache.phoenix.spark
”,但是该模块是Jar for Sure的一部分。 I don't think following approach is right because it is copy - paste from different examples, and loading a module like this is a bit anti pattern , as we already have the package as part of the jar. 我不认为以下方法是正确的,因为它是复制-从不同的示例粘贴,并且像这样加载模块有点反模式,因为我们已经将该包作为jar的一部分。 Please show me the right way. 请告诉我正确的方法。
Note- Please do Scala or Phython example , They are easily available over net, 注意-请以Scala或Phython为例,它们很容易通过网络获得,
public class ECLoad {
public static void main(String[] args){
//Create a SparkContext to initialize
String warehouseLocation = new File("spark-warehouse").getAbsolutePath();
SparkSession spark = SparkSession
.builder()
.appName("ECLoad")
.master("local")
.config("spark.sql.warehouse.dir", warehouseLocation)
.getOrCreate();
spark.conf().set("spark.testing.memory", "2147480000"); // if you face any memory issue
Dataset<Row> df = spark.sqlContext().read().format("org.apache.phoenix.spark.*").option("table",
"CLINICAL.ENCOUNTER_CASES").option("zkUrl", "localhost:2181").load();
df.show();
}
}
I'm trying to run it as 我正在尝试以
spark-submit --class "encountercases.ECLoad" --jars phoenix-spark-5.0.0-HBase-2.0.jar,phoenix-core-5.0.0-HBase-2.0.jar --master local ./PASpark-1.0-SNAPSHOT.jar
and I get following error - 我得到以下错误-
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
I see required jars are already at the suggested path and hbase-site.xml symlink exixsts. 我看到所需的jar已位于建议的路径和hbase-site.xml symlink exixsts中。
Before getting phoenix working with spark, you will need to setup the environment for spark so that it knows how to access phoenix/hbase. 在使phoenix与spark配合使用之前,您需要设置spark环境,以便它知道如何访问phoenix / hbase。
First create a symbolic link to hbase-site.xml
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml
Alternatively you can add this file while creating spark session or in spark defaults. 首先创建指向hbase-site.xml
的符号链接ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml
或者,您可以在创建spark会话时或在Spark默认值。
You will need to add jars under /usr/hdp/current/phoenix-client/
to driver as well as executor class path. 您将需要将/usr/hdp/current/phoenix-client/
下的jar添加到驱动程序以及执行程序类路径。 Parameters to be set: spark.driver.extraClassPath
and spark.executor.extraClassPath
要设置的参数: spark.driver.extraClassPath
和spark.executor.extraClassPath
This step is trivial and could easily be translated into java/scala/python/R, above 2 steps are critical for it to work as those setup env: 此步骤很简单,可以轻松地转换为java / scala / python / R,以上两个步骤对于在安装env时至关重要:
val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "CLINICAL.ENCOUNTER_CASES", "zkUrl" -> "localhost:2181"))
Refer to: https://community.hortonworks.com/articles/179762/how-to-connect-to-phoenix-tables-using-spark2.html 请参阅: https : //community.hortonworks.com/articles/179762/how-to-connect-to-phoenix-tables-using-spark2.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.