简体   繁体   中英

Why "java.lang.ClassNotFoundException: Failed to find data source: kinesis" with spark-streaming-kinesis-asl dependency?

My setup:

  scala:2.11.8
  spark:2.3.0.cloudera4

I have already add this in my .pom file:

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-streaming-kinesis-asl_2.11</artifactId>
  <version>2.3.0</version>
</dependency>

However, when I run my spark-streaming code to consume data from kinesis, it returns:

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kinesis.

I got a similar error when I consume data from Kafka and solved it by indicating the dependent jar in the submit command. But it seems this doesn't work this time:

sudo -u hdfs spark2-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.3.0 --class com.package.newkinesis --master yarn  sparktest-1.0-SNAPSHOT.jar 

How to address this issue? Any help is appreciated.

My code:

val spark = SparkSession
      .builder.master("local[4]")
      .appName("SpeedTester")
      .config("spark.driver.memory", "3g")
      .getOrCreate()

    val kinesis = spark.readStream
      .format("kinesis")
      .option("streamName", kinesisStreamName)
      .option("endpointUrl", kinesisEndpointUrl)
      .option("initialPosition", "TRIM_HORIZON")
      .option("awsAccessKey", awsAccessKeyId)
      .option("awsSecretKey", awsSecretKey)
      .load()

    kinesis.writeStream.format("console").start().awaitTermination()

My full .pom file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.netease</groupId>
  <artifactId>sparktest</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>
  <properties>
    <scala.version>2.11.8</scala.version>
  </properties>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <includes>
                                <include>org/apache/spark/*</include>
                            </includes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
        <scope>provided</scope>
      <version>2.3.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
        <scope>provided</scope>
      <version>2.3.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
        <scope>provided</scope>
      <version>2.3.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>2.3.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kinesis-asl_2.11</artifactId>
      <version>2.3.0</version>
    </dependency>
  </dependencies>
</project>

tl;dr It won't work.

You use spark-streaming-kinesis-asl_2.11 dependency that is for the older Spark Streaming API with the new Spark Structured Streaming and hence the exception.

You have to find a compatible Spark Structured Streaming data source for AWS Kinesis which is not officially supported by the Apache Spark project.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM