简体   繁体   中英

How should I setup Kafka paritions vs topics in my setup

I'm looking to use Kafka as my event store/stream for orders, here are a few attributes:

  • I have two regions to cater for: London and New York
  • An order started in London is highly likely to have further events (updates) come from London, however we do need to support cross-regional reads/writes (ie for an event started in London, writes can come from New York)
  • The business would benefit from a lower latency so having London writing to New York or vice versa should be minimized
  • An order has a lifestyle of 24h, it can be archived from the event log at this point as we no longer need it.
  • Need resiliency, if the London Kafka plant goes down, I should be able to failover to New York and vice versa.
  • Ordering of the events needs to be consistent across all regions
  • Order numbers are only in the 1000s per 24h.

So I'm trying to get my setup of Kafka correct so I can minimise the amount of work I have to do external to Kafka, so my concerns/questions are:

  1. Originating region seems like a natural partitioning key, but as far as I can see it, I gain nothing from a partitioning a topic...I could just have 2 topics, one for London, one for New York? Am I correct?
  2. As far as I can see, in order to have the ability to failover, I need to setup two SEPARATE clusters and use mirror maker to sync the two topics across regions. But this would mean I would need to build logic into my applications so that they publish an event to the correct cluster - am I understanding correctly? Is there any way I can setup Kafka so I don't have to do this and I just connect to the local cluster and read/write to that, letting the cluster take care of where it routes the events to

You might want to look into the "rack awareness" configuration for brokers, which helps with rack aware partition replication. This is mostly used to improve cross availability zone traffic, you can read about it more here. The gist of it, is that your consumers can fetch records from the "nearest" replica. In your case a consumer sitting in London might only fetch data from brokers in London, assuming you operate a single cross-region cluster.

Concerning latency: If you don't have any sub-seconds requirements, I would highly recommend to operate a single cluster instead of two. The latency between the east coast and the UK shouldn't be too bad. Keep it simple, Kafka is very robust and can handle most faults within a single cluster (eg a broker dying). Start with a single cluster in one location, you will still be able to add a second one and migrate your data over using mirror maker or a dedicated service.

This would also result in you not having the "same" topic twice for each region. Separate your topics based on their content, not their location. Otherwise you'll have lots of fun, when migrating the data format you use for orders. You want to be as flexible as possible for future changes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM