简体   繁体   中英

Starting new Kafka Streams microservice, when there is data retention period on input topics

Lets assume i have (somewhat) high velocity input topic - for example sensor.temperature and it has retention period of 1 day. Multiple microservices are already consuming data from it. I am also backing up events in historical event store.

Now (as a simplified example) I have new requirement - calculating maximum all time temperature per sensor. This is fitting very well with Kafka Streams, so I have prepared new microservice that creates KTable aggregating temperature (with max) grouped per sensor. Simply deploying this microservice would be enough if input topic had infinite retention, but now maximum would be not all-time, as is our requirement.

I feel this could be common scenario but somehow I was not able to find satisfying solution on the internet.

Maybe I am missing something, but my ideas how to make it work do not feel great:

  1. Replay all past events into the input topic sensor.temperature . This is large amount of data and it would cause all subscribing microservices to run excessive computation, which is most likely not acceptable.
  2. Create duplicate of input topic for my microservice: sensor.temperature.local , where I would always copy all events and then further process(aggregate) them from this local topic. This way I can freely replay historical events into local topic without affecting other microservices. However this local duplicate would be required for all Kafka Streams microservices, and if input topic is high velocity this could be too much duplication.
  3. Maybe there some way to modify KTables more directly, so one could query the historical event store for max value per sensor and put it in the KTable once? But what if streams topology is more complex? It would require orchestrating consistent state in all microsevice's KTables, rather than simply replaying events.

How to design the solution?

Thanks in advance for your help!

In this case I would create a topic that stores the max periodically (so that it won't fell off the topic beacuse of a cleanup). Then you could make your service report the max of the max-topic and the max of the measurement-topic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM