简体   繁体   中英

Hive on Spark: Missing <spark-assembly*.jar>

I'm running Hive 2.1.1, Spark 2.1.0 and Hadoop 2.7.3.

I tried to build Spark following the Hive on Spark: Getting Started :

./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"

However, I couldn't find any spark-assembly jar files under the spark directory ( find . -name "spark-assembly*.jar" returns nothing back). Instead of linking the spark-assembly jar to HIVE_HOME/lib , I tried export SPARK_HOME=/home/user/spark .

I get the following Hive error in beeline:

0: jdbc:hive2://localhost:10000> set hive.execution.engine=spark;
0: jdbc:hive2://localhost:10000> insert into test (id, name) values (1, 'test1');
Error: Error running query: java.lang.NoClassDefFoundError: scala/collection/Iterable (state=,code=0)

I think the error is caused by missing spark-assembly jars.

How could I build / Where could I find those spark-assembly jar files?

How could I fix the above error?

Thank you!

First of all, Spark will not build spark-assembly.jar from 2.0.0, but build all dependency jars to directory $SPARK_HOME/jars

Besides, Hive does not support every version of Spark, actually it has a strong version compatibility restrictions to run Hive on Spark. Depends on which version of Hive you're using, you can always find out the corresponding Spark version in pom.xml file of Hive. For Hive 2.1.1 , the spark version specified in pom.xml is:

<spark.version>1.6.0</spark.version>

As you already know that you need to build spark without hive support. I don't know why but the command in Hive on Spark - Getting Started does not work for me, finally I succeeded with following command:

mvn -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests clean package

And few other troubleshooting tips which I met before(Hope you're not going to meet):

  • Starting Spark Master failed due to failed to find slf4f or hadoop related classes, run export SPARK_DIST_CLASSPATH=$(hadoop classpath) and try again
  • Failed to load snappy native libs, which is caused by that there's no snappy dependency in classpath, or the snappy lib under hadoop classpath is not the correct version for Spark. You can download a correct version of snappy lib and put it under $SPARK_HOME/lib/ , and run export SPARK_DIST_CLASSPATH=$SPARK_HOME/lib/*:$(hadoop classpath) and try again.

Hope this could be helpful and everything goes well to you.

Yes, They are not building spark-assembly.jar file from spark 2.0.0 onwards. Independent small jar files are available in jars directory.

https://issues.apache.org/jira/browse/SPARK-11157

https://issues.apache.org/jira/secure/attachment/12767129/no-assemblies.pdf

find . -iname '*spark*'

won't find you any spark related jar

However, I'm using Hive 2.1.0 via brew install on mac. Problem still.

Have a look at

Hive on Spark: Getting Started

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM