简体   繁体   中英

Kafka as a data store for future events

I have a Kafka cluster which receives messages from a source based on data changes in that source. In some cases the messages are meant to be processed in the future. So I have 2 options:

  1. Consume all messages and post messages that are meant for the future back to Kafka under a different topic (with the date in the topic name) and have a Storm topology that looks for topics with that date's name in it. This will ensure that messages are processed only on the day it's meant for.
  2. Store it in a separate DB and build a scheduler that reads messages and posts to Kafka only on that future date.

Option 1 is easier to execute but my question is: Is Kafka a durable data store? And has anyone done this sort of eventing with Kafka? Are there any gaping holes in the design?

You can configure the amount of time your messages stay in Kafka (log.retention.hours).

But keep in mind that Kafka is meant to be used as a "real-time buffer" between your producers and your consumers, not as durable data store. I don't think Kafka+Storm would be the appropriate tool for your use case. Why not just write your messages in some distributed file system, and schedule a job (MapReduce, Spark...) to process those events?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM