简体   繁体   中英

How to read from specific Kafka partition in Spark structured streaming

I have three partitions for my Kafka topic and I was wondering if I could read from just one partition out of three. My consumer is spark structured streaming application.

Below is my existing kafka settings in spark.

  val inputDf = spark.readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", brokers)
  .option("subscribe", topic)
  .option("startingOffsets", "latest")
  .load()

Here is how you can read from specific partition.

 val inputDf = spark.readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", brokers)
  .option("assign", """{"topic":[0]}""") 
  .option("startingOffsets", "latest")
  .load()

PS: To read from multiple partitions instead of 1--> """{"topic":[0,1,2..n]}"""

Similarly, How do you write to a specific partition. I tried this and it doesn't work.

        someDF
          .selectExpr("key", "value")
          .writeStream
          .format("kafka")
          .option("kafka.bootstrap.servers", kafkaServers)
          .option("topic", "someTopic")
          .option("partition", partIdx)
          .start()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM