简体   繁体   中英

Kafka streams application reset globalTable and input topics

I am building a kafka-streams application with the intent of horizontal scaling and data reprocessing in case of business logic failure.

The application consumes data from two topics that have the same number of partitions and are joined with KStream::merge . There is also a third topic that supplies data that has to be consumed by all application instances, which creates the difficulty I'm facing.

So far I tried to use a globalTable to provide the data from the global topic, but I'm unsure of its behavior when I reset the application to consume historical data.

As far as I understand after application reset all of the merged input topics are consumed in such a way that processor receives data with increasing timestamps. My concern is that when I consume data via GlobalTable and provide it to processors via StateStore , this functionality does not apply. It seems that as I reprocess the data, the state that is served from StateStore is just the latest consumed state, and is not related to the input data via timestamp.

My questions are:

  1. How do I provide the "global" input topic to all application instances, so that each of them have all the data?
  2. How does GlobalTable state store behave after application reset? Is the state topic consumed in sync with other input topics?

Not sure what do you mean by "global" input topic. But, if the same data to be consumed by multiple applications for the Data Transformation/Enrichment down the line, then best option would be setting as many as consumer group as your applications subscribed to the same topic. By doing this way, same set of data would be broadcasted across different consumer groups having multiple consumer instances of same application.

Also, by application reset, you mean ART (Application Reset Tool) KAFKA provides through CLI?

If so, as per my understanding, there isn't specific requirement for GlobalTable alone, but ART works over the Stream Processor as a single entity.

What happens to the Kafka state store when you use the application reset tool?

https://docs.confluent.io/4.0.0/streams/developer-guide/app-reset-tool.html

What ART does?

Input topics : Reset offsets to specified position (by default to the beginning of the topic).c

Intermediate topics : Skip to the end of the topic, ie, set the application's committed consumer offsets for all partitions to each partition's logSize (for consumer group application.id).

Internal topics : Delete the internal topic (this automatically deletes any committed offsets).

What ART doesn't?

  • Reset output topics of an application. If any output (or intermediate) topics are consumed by downstream applications, it is your responsibility to adjust those downstream applications as appropriate when you reset the upstream application.

  • Reset the local environment of your application instances. It is your responsibility to delete the local state on any machine on which an application instance was running.

For a complete application reset, you must delete the application's local state directory on any machines where the application instance was running. You must do this before restarting an application instance on the same machine. You can use either of these methods:

  • The API method KafkaStreams#cleanUp() in your application code.
  • Manually delete the corresponding local state directory (default location: /tmp/kafka-streams/<application.id>).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM