结构化Spark Streaming抛出java.lang.NoSuchMethodError：org.apache.spark.sql.SQLContext.internalCreateDataFrame

Question

I'm starting with Structured Spark Streaming with Kafka source and was following a simple tutorial. 我从使用Kafka源的Structured Spark Streaming开始，并遵循一个简单的教程。 My Kafka server is OSS kafka. 我的Kafka服务器是OSS kafka。 My producer source code is as follows 我的制作人源代码如下

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

public class LogGenerator {
    public static void main(String[] args) throws Exception {
        Properties prop = new Properties();
        prop.load(ClassLoader.getSystemResourceAsStream("kafka-producer.properties"));

        KafkaProducer<String, String> producer = new KafkaProducer<String, String>(prop);

        for (int i=0; i<10000; i++) {
            System.out.printf("%d ", i);
            producer.send(new ProducerRecord<>("my-log-topic", Integer.toString(i), Integer.toString(i)));
        }
    }

}

The producer writes just (0,0) through (999,999) 制作人只写（0,0）到（999,999）

The structured Streaming code to read from this topic is as follows 从该主题中读取的结构化流式代码如下所示

mport org.apache.spark.sql.SparkSession mport org.apache.spark.sql.SparkSession

object DimaStreamer {

  //initialize configs

  val spark: SparkSession = SparkSession.builder().master("local").appName("StreamingData").getOrCreate()

  def main(args: Array[String]): Unit = {
    val dfRefundStream = spark.readStream.format("kafka")
      .option("kafka.bootstrap.servers", "0.0.0.0:9091,0.0.0.0:9092")
      .option("subscribe", "my-log-topic")

      .load()

    import org.apache.spark.sql.functions._
    dfRefundStream.printSchema()
    dfRefundStream.select(col("value")).writeStream
      .format("console")
        .queryName("inittest")
      .start()
      .awaitTermination()

  }
}

Its a maven project. 它是一个maven项目。 The job is executed as follows with the --jars option used for passing the jars with comma separation. 使用--jars选项执行作业，如下所示，用于以逗号分隔传递jar。

spark-submit --class com.apple.arcadia.solr.batch.DimaStreamer --jars $JARSPATH target/mainproject-1.0-SNAPSHOT.jar

The job throws the following exception 该作业抛出以下异常

19/06/13 15:10:05 INFO internals.ConsumerCoordinator: Setting newly assigned partitions [my-log-topic-1, my-log-topic-0] for group spark-kafka-source-0ce55060-09ad-430a-9916-e0063d8103fe--48657453-driver-0
19/06/13 15:10:06 INFO kafka010.KafkaSource: Initial offsets: {"my-log-topic":{"1":9878,"0":10122}}
19/06/13 15:10:06 INFO streaming.StreamExecution: Committed offsets for batch 0. Metadata OffsetSeqMetadata(0,1560463806021,Map(spark.sql.shuffle.partitions -> 200))
19/06/13 15:10:06 INFO kafka010.KafkaSource: GetBatch called with start = None, end = {"my-log-topic":{"1":9878,"0":10122}}
19/06/13 15:10:06 INFO kafka010.KafkaSource: Partitions added: Map()
19/06/13 15:10:06 INFO kafka010.KafkaSource: GetBatch generating RDD of offset range: KafkaSourceRDDOffsetRange(my-log-topic-0,10122,10122,None), KafkaSourceRDDOffsetRange(my-log-topic-1,9878,9878,None)
19/06/13 15:10:06 ERROR streaming.StreamExecution: Query inittest [id = 305e10aa-a446-4f42-a4e9-8d2372250fa8, runId = 2218049f-08a4-40bf-9b46-b7e898321b85] terminated with error
java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.internalCreateDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;Z)Lorg/apache/spark/sql/Dataset;
    at org.apache.spark.sql.kafka010.KafkaSource.getBatch(KafkaSource.scala:301)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:614)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:610)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:610)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:610)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:609)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
Exception in thread "stream execution thread for inittest [id = 305e10aa-a446-4f42-a4e9-8d2372250fa8, runId = 2218049f-08a4-40bf-9b46-b7e898321b85]" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.internalCreateDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;Z)Lorg/apache/spark/sql/Dataset;
    at org.apache.spark.sql.kafka010.KafkaSource.getBatch(KafkaSource.scala:301)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:614)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:610)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:610)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:610)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:609)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: org.apache.spark.sql.SQLContext.internalCreateDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;Z)Lorg/apache/spark/sql/Dataset;
=== Streaming Query ===
Identifier: inittest [id = 305e10aa-a446-4f42-a4e9-8d2372250fa8, runId = 2218049f-08a4-40bf-9b46-b7e898321b85]
Current Committed Offsets: {}
Current Available Offsets: {KafkaSource[Subscribe[my-log-topic]]: {"my-log-topic":{"1":9878,"0":10122}}}

Current State: ACTIVE
Thread State: RUNNABLE

Logical Plan:
Project [value#1]
+- StreamingExecutionRelation KafkaSource[Subscribe[my-log-topic]], [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]

    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:343)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.internalCreateDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;Z)Lorg/apache/spark/sql/Dataset;
    at org.apache.spark.sql.kafka010.KafkaSource.getBatch(KafkaSource.scala:301)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:614)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:610)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:610)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:610)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:609)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
    at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
    ... 1 more

Based on the exception, it seems like kafka010.KafkaSource class seems to be missing a method with signature SQLContext.internalCreateDataFrame(rdd, schema) 基于异常，似乎kafka010.KafkaSource类似乎缺少一个带有签名SQLContext.internalCreateDataFrame（rdd，schema）的方法

Upon checking the source code here[ https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala] 在这里检查源代码[ https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource .scala]

I see the method exists. 我看到方法存在。 Has anyone seen such an issue and resolved it? 有谁见过这样的问题并解决了吗？ If yes, could you please help understand what the issue was 如果是的话，请帮助您理解问题所在

Finally, the details are the POM are 最后，细节是POM

 <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
      <version>2.3.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.3.1</version>
      <classifier>tests</classifier>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.3.1</version>
    </dependency>
      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10 -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>2.3.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
      <version>2.3.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.3.1</version>
    </dependency>

Answer 1

I was able to run this as I noticed that the spark version where I'm running spark-submit was 2.2 and the libraries were all (as you can see) are 2.3.1. 我能够运行这个，因为我注意到我运行spark-submit的spark版本是2.2，所有库（如你所见）都是2.3.1。 I changed spark-version and was able to successfully run it. 我更改了spark-version并且能够成功运行它。

结构化Spark Streaming抛出java.lang.NoSuchMethodError：org.apache.spark.sql.SQLContext.internalCreateDataFrame

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-06-13 22:58:15

结构化Spark Streaming抛出java.lang.NoSuchMethodError：org.apache.spark.sql.SQLContext.internalCreateDataFrame

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-06-13 22:58:15

解决方案1
0 已采纳 2019-06-13 22:58:15