[英]Apache Spark with kafka stream - Missing Kafka
我試圖用 kafka 設置 Apache Spark,並在本地編寫了簡單的程序,但它失敗了,無法從調試中找出答案。
build.gradle.kts
implementation ("org.jetbrains.kotlin:kotlin-stdlib:1.4.0")
implementation ("org.jetbrains.kotlinx.spark:kotlin-spark-api-3.0.0_2.12:1.0.0-preview1")
compileOnly("org.apache.spark:spark-sql_2.12:3.0.0")
implementation("org.apache.kafka:kafka-clients:3.0.0")
主要function密碼是
val spark = SparkSession
.builder()
.master("local[*]")
.appName("Ship metrics").orCreate
val shipmentDataFrame = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "test")
.option("includeHeaders", "true")
.load()
val query = shipmentDataFrame.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
query.writeStream()
.format("console")
.outputMode("append")
.start()
.awaitTermination()
並收到錯誤:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:194)
at com.tgt.ff.axon.shipmetriics.stream.ShipmentStream.run(ShipmentStream.kt:23)
at com.tgt.ff.axon.shipmetriics.ApplicationKt.main(Application.kt:12)
21/12/25 22:22:56 INFO SparkContext: Invoking stop() from shutdown hook
JetBrains 的 Kotlin API for Spark ( https://github.com/Kotlin/kotlin-spark-api ) 自 1.1.0 更新以來就支持流式傳輸。 還有一個 Kafka stream 的例子可能對你有幫助: https://github.com/Kotlin/kotlin-spark-api/blob/spark-3.2/examples/src/main/kotlin/org/jetbrains /kotlinx/spark/examples/streaming/KotlinDirectKafkaWordCount.kt
它確實使用了Spark DStream API而不是您似乎正在使用的Spark Structured Streaming API 。
當然,如果您願意,您仍然可以使用結構化流式傳輸,但是需要像此處描述的那樣進行部署。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.