I have three partitions for my Kafka topic and I was wondering if I could read from just one partition out of three. My consumer is spark structured streaming application.
Below is my existing kafka settings in spark.
val inputDf = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.load()
Here is how you can read from specific partition.
val inputDf = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("assign", """{"topic":[0]}""")
.option("startingOffsets", "latest")
.load()
PS: To read from multiple partitions instead of 1--> """{"topic":[0,1,2..n]}"""
Similarly, How do you write to a specific partition. I tried this and it doesn't work.
someDF
.selectExpr("key", "value")
.writeStream
.format("kafka")
.option("kafka.bootstrap.servers", kafkaServers)
.option("topic", "someTopic")
.option("partition", partIdx)
.start()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.