[英]read a data from kafka topic and aggregate using spark tempview?
[英]unable to read kafka topic data using spark
我在我創建的名為"sampleTopic"
的主題之一中有如下數據
sid,Believer
其中第一個參數是username
,第二個參數是用戶經常聽的song name
。 現在,我已經啟動了zookeeper
、 Kafka server
和producer
,主題名稱如上所述。 我已經使用CMD
為該主題輸入了上述數據。 現在,我想閱讀 spark 中的主題執行一些聚合,並將其寫回 stream。 下面是我的代碼:
package com.sparkKafka
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object SparkKafkaTopic {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
println("hey")
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "sampleTopic1")
.load()
val query = df.writeStream
.outputMode("append")
.format("console")
.start().awaitTermination()
}
}
但是,當我執行上面的代碼時,它給出了:
+----+--------------------+------------+---------+------+--------------------+-------------+
| key| value| topic|partition|offset| timestamp|timestampType|
+----+--------------------+------------+---------+------+--------------------+-------------+
|null|[73 69 64 64 68 6...|sampleTopic1| 0| 4|2020-05-31 12:12:...| 0|
+----+--------------------+------------+---------+------+--------------------+-------------+
也有無限循環消息
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
根據 Srinivas 的建議修改后,我得到了以下 output:
不知道這里到底出了什么問題。 請指導我完成它。
spark-sql-kafka jar 丟失,它具有“kafka”數據源的實現。
您可以使用配置選項添加 jar 或構建包含 spark-sql-kafka jar 的胖 jar。 請使用jar相關版本
val spark = SparkSession.builder()
.appName("SparkKafka").master("local[*]")
.config("spark.jars","/path/to/spark-sql-kafka-xxxxxx.jar")
.getOrCreate()
嘗試將spark-sql-kafka
庫添加到您的構建文件中。 檢查下面。
構建.sbt
libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.0"
// Change to Your spark version
pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.3.0</version> // Change to Your spark version
</dependency>
更改您的代碼,如下所示
package com.sparkKafka
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
case class KafkaMessage(key: String, value: String, topic: String, partition: Int, offset: Long, timestamp: String)
object SparkKafkaTopic {
def main(args: Array[String]) {
//val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
println("hey")
val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
import spark.implicits._
val mySchema = StructType(Array(
StructField("userName", StringType),
StructField("songName", StringType)))
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "sampleTopic1")
.load()
val query = df
.as[KafkaMessage]
.select(split($"value", ",")(0).as("userName"),split($"value", ",")(1).as("songName"))
.writeStream
.outputMode("append")
.format("console")
.start()
.awaitTermination()
}
}
/*
+------+--------+
|userid|songname|
+------+--------+
| sid|Believer|
+------+--------+
*/
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.