简体   繁体   English

如何使用Spark的Kafka直接流设置消费者群体提交的偏移量?

[英]How to set offset committed by the consumer group using Spark's Direct Stream for Kafka?

I am trying to use Spark's Direct Approach (No Receivers) for Kafka , I have following Kafka configuration map: 我正在尝试使用Spark的直接方法(无接收器)用于Kafka ,我有以下Kafka配置图:

configMap.put("zookeeper.connect","192.168.51.98:2181");
configMap.put("group.id", UUID.randomUUID().toString());
configMap.put("auto.offset.reset","smallest");
configMap.put("auto.commit.enable","true");
configMap.put("topics","IPDR31");
configMap.put("kafka.consumer.id","kafkasparkuser");
configMap.put("bootstrap.servers","192.168.50.124:9092");

Now my objective is, if my Spark pipeline crashes and it is started again, the stream should be started from the latest offset committed by the consumer group. 现在我的目标是,如果我的Spark管道崩溃并再次启动,则应该从使用者组提交的最新偏移量开始流。 So, for that purpose, I want to specify the starting offset for consumer. 因此,为此目的,我想为消费者指定起始偏移量。 I have information about the offsets committed in each partition. 我有关于每个分区中提交的偏移量的信息。 How I can supply this information to the streaming function. 我如何将此信息提供给流功能。 Currently I am using 目前我正在使用

JavaPairInputDStream<byte[], byte[]> kafkaData =
   KafkaUtils.createDirectStream(js, byte[].class, byte[].class,
     DefaultDecoder.class, DefaultDecoder.class,configMap,topic); 

Look at the second form of createDirectStream in the Spark API docs - it allows you to pass in a Map<TopicAndPartition, Long> , where the Long is the offset. Spark API文档中查看createDirectStream的第二种形式 - 它允许您传入Map<TopicAndPartition, Long> ,其中Long是偏移量。

Note that Spark will not automatically update your offsets in Zookeeper when using a DirectInputStream - you have to write them yourself either to ZK or some other database. 请注意,使用DirectInputStream时,Spark不会自动更新Zookeeper中的偏移量 - 您必须自己将它们写入ZK或其他数据库。 Unless you have a strict requirement of exactly-once semantics, it will be easier to use the createStream method to get back a DStream, in which case Spark will update the offsets in ZK and resume from the last stored offset in the case of failure. 除非您严格要求完全一次语义,否则使用createStream方法更容易获取DStream,在这种情况下,Spark将更新ZK中的偏移量,并在发生故障时从最后存储的偏移量中恢复。

For your requirement, the correct solution is to use checkpoint. 根据您的要求,正确的解决方案是使用检查点。 For every RDDStream processed, checkpoint will write the metadata to a shared storage specified (typically hdfs). 对于每个已处理的RDDStream,检查点都会将元数据写入指定的共享存储(通常为hdfs)。 Its metadata, not the real data so there is no real performance impact. 它的元数据,而不是真实的数据,所以没有真正的性能影响。

If the spark process is crashed and restarted, it will first read the checkpoint and resume from the saved offsets from checkpoint. 如果火花过程崩溃并重新启动,它将首先读取检查点并从检查点保存的偏移量中恢复。

You can refer the sample code where i use spark streaming to write data to elasticsearch reliabily using checkpoint. 你可以参考我使用spark streaming的示例代码,使用checkpoint将数据写入elasticsearch。 https://github.com/atulsm/Test_Projects/blob/master/src/spark/StreamingKafkaRecoverableDirectEvent.java https://github.com/atulsm/Test_Projects/blob/master/src/spark/StreamingKafkaRecoverableDirectEvent.java

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用直接流在Kafka Spark Streaming中指定使用者组 - how to specify consumer group in Kafka Spark Streaming using direct stream Kafka使用者组,在创建使用者组时将偏移设置为0 - Kafka consumer group, set offset to 0 when consumer group is created 如何在 spark kafka 流中创建消费者组并将消费者分配给消费者组 - How can make consumer group in spark kafka stream and assign comsumers to consumer group 如何将Spark和Kafka集成到直接流中 - How to integrate Spark and Kafka for direct stream 如何在 Spring 引导 Kafka 中将不同的消费者组 ID 设置为同一个消费者工厂 bean? - How to set different consumer group id's to the same consumer factory bean in Spring boot Kafka? 卡夫卡管理消费群体之间的抵销 - Kafka manage offset between consumer group Kafka - 具有特定偏移量的消费者群体创建? - Kafka - Consumer group creation with specific offset? Spring Kafka:kafkaTemplate executeInTransaction 方法如何发挥 Consumer 的 read_committed 隔离级别 - Spring Kafka: How does kafkaTemplate executeInTransaction method play with Consumer's read_committed isolation level 将kafka-consumer设置为consumer-group中的容灾消费者? - Set kafka-consumer as a disaster recovery consumer in a consumer-group? 使用春季卡夫卡消费者的卡夫卡稳定集团 - kafka restabilizing group using spring kafka consumer
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM