简体   繁体   English

Kafka 流应用程序重置 globalTable 和输入主题

[英]Kafka streams application reset globalTable and input topics

I am building a kafka-streams application with the intent of horizontal scaling and data reprocessing in case of business logic failure.我正在构建一个 kafka-streams 应用程序,目的是在业务逻辑失败的情况下进行水平扩展和数据重新处理。

The application consumes data from two topics that have the same number of partitions and are joined with KStream::merge .该应用程序使用来自具有相同分区数并与KStream::merge连接的两个主题的数据。 There is also a third topic that supplies data that has to be consumed by all application instances, which creates the difficulty I'm facing.还有第三个主题提供所有应用程序实例必须使用的数据,这造成了我面临的困难。

So far I tried to use a globalTable to provide the data from the global topic, but I'm unsure of its behavior when I reset the application to consume historical data.到目前为止,我尝试使用globalTable来提供来自全局主题的数据,但是当我重置应用程序以使用历史数据时,我不确定它的行为。

As far as I understand after application reset all of the merged input topics are consumed in such a way that processor receives data with increasing timestamps.据我了解,在应用程序重置后,所有merged的输入主题都以处理器接收时间戳增加的数据的方式使用。 My concern is that when I consume data via GlobalTable and provide it to processors via StateStore , this functionality does not apply.我担心的是,当我通过 GlobalTable 使用数据并通过GlobalTable将其提供给处理器StateStore ,此功能不适用。 It seems that as I reprocess the data, the state that is served from StateStore is just the latest consumed state, and is not related to the input data via timestamp.似乎当我重新处理数据时,从 StateStore 提供的StateStore只是最新使用的 state,并且与时间戳的输入数据无关。

My questions are:我的问题是:

  1. How do I provide the "global" input topic to all application instances, so that each of them have all the data?如何向所有应用程序实例提供“全局”输入主题,以便每个应用程序实例都拥有所有数据?
  2. How does GlobalTable state store behave after application reset? GlobalTable state 商店在应用程序重置后表现如何? Is the state topic consumed in sync with other input topics? state 主题是否与其他输入主题同步使用?

Not sure what do you mean by "global" input topic.不确定“全局”输入主题是什么意思。 But, if the same data to be consumed by multiple applications for the Data Transformation/Enrichment down the line, then best option would be setting as many as consumer group as your applications subscribed to the same topic.但是,如果多个应用程序要使用相同的数据进行数据转换/丰富,那么最好的选择是设置与订阅同一主题的应用程序一样多的消费者组。 By doing this way, same set of data would be broadcasted across different consumer groups having multiple consumer instances of same application.通过这种方式,同一组数据将在具有相同应用程序的多个消费者实例的不同消费者组之间进行广播。

Also, by application reset, you mean ART (Application Reset Tool) KAFKA provides through CLI?另外,应用程序重置是指KAFKA通过CLI提供的ART(应用程序重置工具)吗?

If so, as per my understanding, there isn't specific requirement for GlobalTable alone, but ART works over the Stream Processor as a single entity.如果是这样,根据我的理解,单独对 GlobalTable 没有特定要求,但 ART 作为一个实体在 Stream 处理器上工作。

What happens to the Kafka state store when you use the application reset tool? 使用应用程序重置工具时,Kafka state 存储会发生什么情况?

https://docs.confluent.io/4.0.0/streams/developer-guide/app-reset-tool.html https://docs.confluent.io/4.0.0/streams/developer-guide/app-reset-tool.html

What ART does? ART 做什么?

Input topics : Reset offsets to specified position (by default to the beginning of the topic).c输入主题:将偏移量重置为指定的 position(默认为主题的开头)。c

Intermediate topics : Skip to the end of the topic, ie, set the application's committed consumer offsets for all partitions to each partition's logSize (for consumer group application.id).中间主题:跳到主题的末尾,即将所有分区的应用程序提交的消费者偏移量设置为每个分区的logSize(对于消费者组application.id)。

Internal topics : Delete the internal topic (this automatically deletes any committed offsets).内部主题:删除内部主题(这会自动删除任何已提交的偏移量)。

What ART doesn't? ART 没有什么?

  • Reset output topics of an application.重置应用程序的 output 主题。 If any output (or intermediate) topics are consumed by downstream applications, it is your responsibility to adjust those downstream applications as appropriate when you reset the upstream application.如果下游应用程序使用任何 output(或中间)主题,则您有责任在重置上游应用程序时适当地调整这些下游应用程序。

  • Reset the local environment of your application instances.重置应用程序实例的本地环境。 It is your responsibility to delete the local state on any machine on which an application instance was running.您有责任在运行应用程序实例的任何机器上删除本地 state。

For a complete application reset, you must delete the application's local state directory on any machines where the application instance was running.要完全重置应用程序,您必须在运行应用程序实例的任何机器上删除应用程序的本地 state 目录。 You must do this before restarting an application instance on the same machine.您必须在同一台机器上重新启动应用程序实例之前执行此操作。 You can use either of these methods:您可以使用以下任一方法:

  • The API method KafkaStreams#cleanUp() in your application code.应用程序代码中的 API 方法 KafkaStreams#cleanUp()。
  • Manually delete the corresponding local state directory (default location: /tmp/kafka-streams/<application.id>).手动删除对应的本地 state 目录(默认位置:/tmp/kafka-streams/<application.id>)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Kafka Streams应用程序中,有没有办法使用输出主题的通配符列表定义拓扑? - In a Kafka Streams application, is there a way to define a topology with a wildcard list of output topics? 合并多个相同的Kafka Streams主题 - Merging multiple identical Kafka Streams topics Kafka流,将输出分支到多个主题 - Kafka streams, branched output to multiple topics Spring和Kafka:加入3个Kafka主题以生成输出Kafka流 - Spring and Kafka: Join 3 Kafka topics to generate output Kafka streams 如何解决 Spring Cloud Stream Kafka Streams Binder 中多路复用输入主题的 InvalidTopicException? - How to solve InvalidTopicException with multiplexed input topics in Spring Cloud Stream Kafka Streams Binder? Kafka Streams - 根据 Streams 数据发送不同的主题 - Kafka Streams - Send on different topics depending on Streams Data Kafka 流应用程序无法启动 - Kafka streams application failing to start Spring Cloud将数据有条件转发到Kafka主题 - Spring Cloud Streams Conditional Forwarding of Data to Kafka Topics Kafka-Streams 加入 2 个带有 JSON 值的主题 | 背压机制? - Kafka-Streams Join 2 topics with JSON values | backpressure mechanism? 将消费者补偿从Kafka Streams重新设置为开始 - Reset consumer offset to the beginning from Kafka Streams
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM