Apache Spark 與 kafka stream - 缺少 Kafka

Question

我試圖用 kafka 設置 Apache Spark，並在本地編寫了簡單的程序，但它失敗了，無法從調試中找出答案。

build.gradle.kts

implementation ("org.jetbrains.kotlin:kotlin-stdlib:1.4.0")
implementation ("org.jetbrains.kotlinx.spark:kotlin-spark-api-3.0.0_2.12:1.0.0-preview1")
compileOnly("org.apache.spark:spark-sql_2.12:3.0.0")
implementation("org.apache.kafka:kafka-clients:3.0.0")

主要function密碼是

val spark = SparkSession
    .builder()
    .master("local[*]")
    .appName("Ship metrics").orCreate

        val shipmentDataFrame = spark
            .readStream()
            .format("kafka")
            .option("kafka.bootstrap.servers", "localhost:9092")
            .option("subscribe", "test")
            .option("includeHeaders", "true")
            .load()

      val query =  shipmentDataFrame.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

        query.writeStream()
            .format("console")
            .outputMode("append")
            .start()
            .awaitTermination()

並收到錯誤：

Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:194)
    at com.tgt.ff.axon.shipmetriics.stream.ShipmentStream.run(ShipmentStream.kt:23)
    at com.tgt.ff.axon.shipmetriics.ApplicationKt.main(Application.kt:12)
21/12/25 22:22:56 INFO SparkContext: Invoking stop() from shutdown hook

Answer 1

JetBrains 的 Kotlin API for Spark ( https://github.com/Kotlin/kotlin-spark-api ) 自 1.1.0 更新以來就支持流式傳輸。 還有一個 Kafka stream 的例子可能對你有幫助： https://github.com/Kotlin/kotlin-spark-api/blob/spark-3.2/examples/src/main/kotlin/org/jetbrains /kotlinx/spark/examples/streaming/KotlinDirectKafkaWordCount.kt

它確實使用了Spark DStream API而不是您似乎正在使用的Spark Structured Streaming API 。

當然，如果您願意，您仍然可以使用結構化流式傳輸，但是需要像此處描述的那樣進行部署。

Apache Spark 與 kafka stream - 缺少 Kafka

問題描述

1 個解決方案

解決方案1
1 2022-05-31 16:02:49

Apache Spark 與 kafka stream - 缺少 Kafka

問題描述

1 個解決方案

解決方案1 1 2022-05-31 16:02:49

解決方案1
1 2022-05-31 16:02:49