简体   繁体   English

从Kafka倒转偏移Spark结构化流

[英]Rewind Offset Spark Structured Streaming from Kafka

I am using spark structured streaming (2.2.1) to consume a topic from Kafka (0.10). 我正在使用Spark结构化流(2.2.1)来使用Kafka(0.10)中的主题。

 val df = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", fromKafkaServers)
      .option("subscribe", topicName)
      .option("startingOffset", "earliest")
      .load()

My checkpoint location is set on an external HDFS dir. 我的检查点位置在外部HDFS目录上设置。 In some cases, I would like to restart the streaming application and consume data from the beginning. 在某些情况下,我想重新启动流应用程序并从一开始就使用数据。 However, even though I delete all the checkpointing data from the HDFS dir and resubmit the jar, Spark is still able to find my last consumed offset and resume from there. 但是,即使我从HDFS目录中删除了所有检查点数据并重新提交了jar,Spark仍然能够找到我最后消耗的偏移量并从那里恢复。 Where else does the offset live? 偏移量还住在哪里? I suspect it is related to Kafka Consumer Id. 我怀疑它与Kafka Consumer ID有关。 However, I am unable to set group.id with spark structured streaming per Spark Doc and it seems like all applications subscribing to the same topic gets assigned to one consumer group. 但是,我无法使用每个Spark Doc的 spark结构化流来设置group.id,看来所有订阅同一主题的应用程序都分配给一个使用者组。 What if I would like to have two independent streaming job running that subscribes to the same topic? 如果我想运行两个订阅相同主题的独立流式作业怎么办?

你有错字:)它是startingOffsets

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在来自 Kafka 的结构化流中倒带和重新使用偏移量 - Rewind and reconsume offset in structured streaming from Kafka 如何在火花结构化流读取流中倒带 Kafka Offsets - How to rewind Kafka Offsets in spark structured streaming readstream Spark结构化流Kafka集成偏移管理 - Spark Structured Streaming Kafka Integration Offset management Spark Structured Streaming Kafka 错误——偏移量已更改 - Spark Structured Streaming Kafka error -- offset was changed Spark Structured Streaming - kafka 偏移处理 - Spark Structured Streaming - kafka offset handling Spark Structured Streaming NOT 处理 Kafka 偏移量过期 - Spark Structured Streaming NOT process Kafka offset expires Spark Structured Streaming Kafka Offset 管理 - Spark Structured Streaming Kafka Offset Management 在使用 Kafka 的 Spark Structured streaming 中,Spark 如何管理多个主题的偏移量 - In Spark Structured streaming with Kafka, how spark manages offset for multiple topics 清除偏移量激发来自 kafka 的结构化流 - Clear offsets spark structured streaming from kafka 将数据从 Kafka 导入 Spark Structured Streaming 时,如何在“startingOffsets”中包含“最新”和“具有特定偏移量的 JSON” - How to include both "latest" and "JSON with specific Offset" in "startingOffsets" while importing data from Kafka into Spark Structured Streaming
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM