简体繁体中英

How to configure Hive to use Spark execution engine on Google Dataproc?

原文 2017-04-10 12:01:03 4 1 apache-spark/ hive/ google-cloud-dataproc

I'm trying to configure Hive, running on Google Dataproc image v1.1 (so Hive 2.1.0 and Spark 2.0.2), to use Spark as an execution engine instead of the default MapReduce one.

Following the instructions here https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started doesn't really help, I keep getting Error running query: java.lang.NoClassDefFoundError: scala/collection/Iterable errors when I set hive.execution.engine=spark .

Does anyone know the specific steps to get this running on Dataproc? From what I can tell it should just be a question of making Hive see the right JARs, since both Hive and Spark are already installed and configured on the cluster, and using Hive from Spark (so the other way around) works fine.

1 answers

This will probably not work with the jars in a Dataproc cluster. In Dataproc, Spark is compiled with Hive bundled (-Phive), which is not suggested / supported by Hive on Spark.

If you really want to run Hive on Spark, you might want to try to bring your own Spark in an initialization action compiled as described in the wiki .

If you just want to run Hive off MapReduce on Dataproc running Tez, with this initialization action would probably be easier.

How to configure the Hive cli when using the Spark execution engine?

Spark as execution engine with Hive

How to configure Hive to use Spark?

Hive on Spark and Spark as hive execution engine: What's the difference

Setting Spark as default execution engine for Hive

Not able to make Spark as Hive execution engine

How to run spark 3.2.0 on google dataproc?

hive execution engine - Spark - Failed to create spark client

How do I get a spark job to use all available resources on a Google Cloud DataProc cluster?

How to enable pyspark HIVE support on Google Dataproc master node

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to configure the Hive cli when using the Spark execution engine? Spark as execution engine with Hive How to configure Hive to use Spark? Hive on Spark and Spark as hive execution engine: What's the difference Setting Spark as default execution engine for Hive Not able to make Spark as Hive execution engine How to run spark 3.2.0 on google dataproc? hive execution engine - Spark - Failed to create spark client How do I get a spark job to use all available resources on a Google Cloud DataProc cluster? How to enable pyspark HIVE support on Google Dataproc master node

Related Tags

How to configure Hive to use Spark execution engine on Google Dataproc?

Question

1 answers

solution1 2 2017-04-11 00:27:44

solution1
2 2017-04-11 00:27:44