繁体   English   中英

Spark 使用 Hive,无法使用 Hive 支持实例化 SparkSession,因为找不到 Hive 类

[英]Spark with Hive, Unable to instantiate SparkSession with Hive support because Hive classes are not found

spark应用是从Hive加载数据:

    SparkSession spark = SparkSession.builder()
        .appName(topics)
        .config("hive.metastore.uris", "thrift://device1:9083")
        .enableHiveSupport()
        .getOrCreate();

我通过以下方式启动火花:

spark-submit --master local[*] --class zhihu.SparkConsumer target/original-kafka-consumer-0.1-SNAPSHOT.jar  --jars spark-hive_2.11-2.4.4.jar

maven pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.zhihu</groupId>
  <artifactId>kafka-consumer</artifactId>
  <packaging>jar</packaging>
  <version>0.1-SNAPSHOT</version>
  <name>kafkadev</name>
  <url>http://maven.apache.org</url>
  <repositories>
    <repository>
      <!-- Proper URL for Cloudera maven artifactory -->
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

</repositories>
<dependencies>

<!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core -->
<!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api -->
<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
    <version>2.8.2</version>
</dependency>
<dependency> <!-- Spark dependency -->
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>
</dependency>

<dependency> <!-- Spark dependency -->
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>

</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.4.4</version>
      </dependency>

      <dependency>
          <groupId>org.apache.kafka</groupId>
          <artifactId>kafka-clients</artifactId>
          <version>2.1.0</version>

          <exclusions>
        <exclusion>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.log4j</groupId>
            <artifactId>log4j-core</artifactId>
        </exclusion>
        <exclusion>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
        </exclusion>
    </exclusions>

      <scope>compile</scope>
  </dependency>
  <!-- gson -->
  <dependency>
      <groupId>com.google.code.gson</groupId>
      <artifactId>gson</artifactId>
      <version>2.8.2</version>
  </dependency>

  <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
  </dependency>

  <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>2.1.1-cdh6.2.0</version>
  </dependency>

  <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-service</artifactId>
      <version>2.1.1-cdh6.2.0</version>
  </dependency>

  <!-- runtime Hive -->
  <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-common</artifactId>
      <version>2.1.1-cdh6.2.0</version>
      <scope>runtime</scope>
  </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-beeline</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-jdbc</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-shims</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-serde</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-contrib</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>

                    <filters>

                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                        <exclude>**/Log4j2Plugins.dat</exclude>
                    </excludes>
                </filter>

                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                    <artifactSet>
                    <excludes>
                        <exclude>classworlds:classworlds</exclude>
                        <exclude>junit:junit</exclude>
                        <exclude>jmock:*</exclude>
                        <exclude>*:xml-apis</exclude>
                        <exclude>org.apache.maven:lib:tests</exclude>
                    </excludes>
                </artifactSet>
            <skip>true</skip>
        </configuration>
          </execution>
        </executions>
    </plugin>

    </plugins>
  </build>
</project>

它看起来没有问题,但它总是提出:

20/05/07 12:03:17 INFO spark.SparkContext: Added JAR file:/data/projects/zhihu_scraper/consumers/target/original-kafka-consumer-0.1-SNAPSHOT.jar at spark://device2:42395/jars/original-kafka-consumer-0.1-SNAPSHOT.jar with timestamp 1588824197724
20/05/07 12:03:17 INFO executor.Executor: Starting executor ID driver on host localhost
20/05/07 12:03:17 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33849.
20/05/07 12:03:17 INFO netty.NettyBlockTransferService: Server created on device2:33849
20/05/07 12:03:17 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/05/07 12:03:17 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager device2:33849 with 366.3 MB RAM, BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63e5e5b4{/metrics/json,null,AVAILABLE,@Spark}
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
    at zhihu.SparkConsumer.main(SparkConsumer.java:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/07 12:03:18 INFO spark.SparkContext: Invoking stop() from shutdown hook

我已经尝试了这篇文章How to create SparkSession with Hive support中的所有答案。 但是,它们都不适合我。

<dependency> <!-- Spark dependency -->
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>
</dependency>

我不知道为什么compile是 scope 它应该是runtime Since You are using maven shade plugin you can package uber jar(with target/original-kafka-consumer-0.1-SNAPSHOT.jar ) with all dependencies in one umbrella/archive and it will be in the classpath so that nothing is missed try this .

hive-site.xml也应该在类路径中。 那么就不需要单独配置metastoreuris了。 以编程方式。

进一步阅读

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM