简体   繁体   English

Java Kafka结构化流

[英]Java Kafka Structured Streaming

I have to perform batch queries (basically in a loop) from Kafka via Spark, each time starting from the last offset read at the previous iteration, so that I only read new data.我必须通过 Spark 从 Kafka 执行批量查询(基本上是在一个循环中),每次都从上一次迭代中读取的最后一个偏移量开始,以便我只读取新数据。

Dataset<Row> df = spark
                .read()
                .format("kafka")
                .option("kafka.bootstrap.servers", "localhost:9092")
                .option("subscribe", "test-reader")
                .option("enable.auto.commit", true)
                .option("kafka.group.id", "demo-reader") //not sure about the one to use
                .option("group.id", "demo-reader")
                .option("startingOffset", "latest")
                .load()

It seems that latest is not supported in batch queries.批处理查询似乎不支持latest I'm wondering if it is possible to do something similar in another way (without dealing directly with offsets).我想知道是否有可能以另一种方式做类似的事情(不直接处理偏移量)。

EDIT: earliest seems to retrieve the whole data contained in topic.编辑: earliest似乎检索到主题中包含的全部数据。

Can you try earliest instead of latest for startingOffsets as shown in below example:您可以尝试earliest而不是latest作为startingOffsets ,如下例所示:

Dataset<Row> df = spark
  .read()
  .format("kafka")
  .option("kafka.bootstrap.servers", "localhost:9092")
  .option("subscribe", "test-reader")
  .option("enable.auto.commit", true)
  .option("kafka.group.id", "demo-reader") //not sure about the one to use
  .option("group.id", "demo-reader")
  .option("startingOffsets", "earliest")
  .option("endingOffsets", "latest")
  .load();

Please refer spark docs请参考火花文档

You should use "latest" for streaming, "earliest" for batch as per the documentation.根据文档,您应该使用“最新”进行流式传输,使用“最早”进行批处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Java Kafka进行Spark结构化流式编程 - Spark Structured Streaming Programming with Kafka in Java 如何使用 Java Spark 结构化流从 Kafka 主题正确消费 - How to consume correctly from Kafka topic with Java Spark structured streaming 如何使用Java中的结构化流来从Kafka反序列化记录? - How to deserialize records from Kafka using Structured Streaming in Java? 无法实例化 Kafka 结构化流 KafkaSourceProvider - Kafka Structured Streaming KafkaSourceProvider could not be instantiated 如何使用Java在Spark结构化流中检查从Kafka获取数据? - How can I check I get data from Kafka in Spark-structured-streaming with Java? 结构化流卡夫卡火花 java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging - Structured Streaming kafka spark java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging Spark2 Kafka结构化流式Java不知道from_json函数 - Spark2 Kafka Structured Streaming Java doesn't know from_json function Spark结构化流获取最后一个Kafka分区的消息 - Spark Structured Streaming getting messages for last Kafka partition 在Spark结构化流媒体中使用Kafka接收器时,检查点是否必须执行? - Is checkpointing mandatory when using a Kafka sink in Spark Structured Streaming? 如何从Spark结构化流媒体获取Kafka输出中的批次ID - How to get batch ID in Kafka output from Spark Structured Streaming
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM