简体   繁体   English

在带有配置单元的无头模式下在 HDP 3.1 上触发 3.x - 找不到配置单元表

[英]spark 3.x on HDP 3.1 in headless mode with hive - hive tables not found

How can I configure Spark 3.x on HDP 3.1 using headless ( https://spark.apache.org/docs/latest/hadoop-provided.html ) version of spark to interact with hive?如何使用无头 ( https://spark.apache.org/docs/latest/hadoop-provided.html ) 版本的 spark 在 HDP 3.1 上配置 Spark 3.x 以与 hive 交互?

First, I have downloaded and unzipped the headless spark 3.x:首先,我已经下载并解压了 headless spark 3.x:

cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
 
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it

./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
|  default|
+---------+

spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?

NOTE笔记

This is an updated version of How can I run spark in headless mode in my custom version on HDP?这是如何在 HDP 上的自定义版本中以无头模式运行 spark的更新版本 for Spark 3.x ond HDP 3.1 and custom spark does not find hive databases when running on yarn .对于 Spark 3.x 和 HDP 3.1, 自定义 spark 在 yarn 上运行时找不到 hive 数据库

Furthermore: I am aware of the problems of ACID hive tables in spark.此外:我知道 spark 中 ACID hive 表的问题。 For now, I simply want to be able to see the existing databases现在,我只想能够看到现有的数据库

edit编辑

We must get the hive jars onto the class path.我们必须将 hive jar 放到类路径上。 Trying as follows:尝试如下:

 export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"

And now using spark-sql:现在使用 spark-sql:

./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

fails with:失败:

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

Ie the line: export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}" , had no effect (same issue if not set).即行: export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}" ,没有效果(如果没有设置同样的问题)。

As noted above and custom spark does not find hive databases when running on yarn the Hive JARs are needed.如上所述, 自定义 spark 在 yarn运行时找不到 hive 数据库,需要 Hive JAR。 They are not supplied in the headless version.它们在无头版本中不提供。

I was unable to retrofit these.我无法改造这些。

Solution: instead of worrying: simply use the spark build with Hadoop 3.2 (on HDP 3.1)解决方案:不用担心:只需使用 Spark 构建与 Hadoop 3.2(在 HDP 3.1 上)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM