Unable to execute Hive queries using spark-submit

Question

I am not able run hive queries using spark-submit command. But, the same is getting executed in spark-shell. I am using AWS EMR as the cluster.

Below is my code written in eclipse scala IDE

 object HiveTest {

  def main(args: Array[String]): Unit =
    {

      val sparkConf = new SparkConf()
      sparkConf.setAppName("WordCountTest")

      val sc = new SparkContext(sparkConf)


      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
      import sqlContext.implicits._

  sqlContext.sql("select * from stream_table");

    }
}

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>spark</groupId>
    <artifactId>word-count</artifactId>
    <version>0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>word-count</name>
    <url>http://maven.apache.org</url>

          <build>
        <plugins>
         <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-assembly-plugin</artifactId>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>single</goal>
            </goals>
            <configuration>
                <archive>
                <manifest>
                    <mainClass>
                        HiveTest
                    </mainClass>
                </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
        </execution>
    </executions>
</plugin>
        </plugins>
    </build>
    <properties>

      <encoding>UTF-8</encoding>
      <scala.version>2.11.8</scala.version>
      <scala.tools.version>2.11</scala.tools.version>
      <spark.version>2.0.0</spark.version>

  </properties>


    <dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_${scala.tools.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka_2.11</artifactId>
    <version>1.6.3</version>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase</artifactId>
    <version>0.90.0</version>
</dependency>

      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_${scala.tools.version}</artifactId>
    <version>${spark.version}</version>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.tools.version}</artifactId>
    <version>${spark.version}</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>${spark.version}</version>
</dependency>


        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>


        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

spark-submit command

spark-submit --master local[2] --class HiveTest ./word-count-0.1-SNAPSHOT-jar-with-dependencies.jar

Error

[hadoop@ip-10-134-23-168 jars]$ spark-submit --master local[2] --class HiveTest ./word-count-0.1-SNAPSHOT-jar-with-dependencies.jar
18/02/12 10:58:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/02/12 10:58:49 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

Answer 1

As the spark version is 2.0 it's better to use the sparksession object instead of sparkcontext or sqlcontext. And you have to create sparksession object with hivesupport enabled.

It is running in spark shell because the spark session and sc are created with give support enabled.

Answer 2

The reason for failure is classpath. When I run spark-submit with dependency jar, default classpath of spark is not being utilized. Adding provided line in the POM dependencies resolved the issue.

Dependencies with the scope provided will not be added to dependency(word-count-0.1-SNAPSHOT-jar-with-dependencies.jar here) jar. They will be used only for compilation.

changed POM.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>spark</groupId>
    <artifactId>word-count</artifactId>
    <version>0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>word-count</name>
    <url>http://maven.apache.org</url>

          <build>
        <plugins>
         <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-assembly-plugin</artifactId>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>single</goal>
            </goals>
            <configuration>
                <archive>
                <manifest>
                    <mainClass>
                        HiveTest
                    </mainClass>
                </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
        </execution>
    </executions>
</plugin>
        </plugins>
    </build>
    <properties>

      <encoding>UTF-8</encoding>
      <scala.version>2.11.8</scala.version>
      <scala.tools.version>2.11</scala.tools.version>
      <spark.version>2.1.0</spark.version>

  </properties>


    <dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_${scala.tools.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka_2.11</artifactId>
    <version>1.6.3</version>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase</artifactId>
    <version>0.90.0</version>
        <scope>provided</scope>
</dependency>

      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_${scala.tools.version}</artifactId>
    <version>${spark.version}</version>
        <scope>provided</scope>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.tools.version}</artifactId>
    <version>${spark.version}</version>
        <scope>provided</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>


        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
                <scope>provided</scope>
        </dependency>


        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
               <scope>provided</scope>
        </dependency>
    </dependencies>
</project>

spark-submit command

spark-submit --master local[2] --class HiveWordCountScala ./word-count-0.1-SNAPSHOT-jar-with-dependencies.jar

Unable to execute Hive queries using spark-submit

Question

2 answers

solution1
0 2018-02-12 12:59:30

solution2
0 2018-02-12 16:55:45

Unable to execute Hive queries using spark-submit

Question

2 answers

solution1 0 2018-02-12 12:59:30

solution2 0 2018-02-12 16:55:45

solution1
0 2018-02-12 12:59:30

solution2
0 2018-02-12 16:55:45