简体   繁体   中英

Correlating in Kafka and dynamic topics

I am building a correlated system using Kafka. Suppose, there's a service A that performs data processing and there're its thousands of clients B that submit jobs to it. B s are short-lived, they appear on the network, push the data to A and then two important things happen:

  1. B will immediately receive a status from A ;
  2. B then will either drop out completely, stay online to receive further updates on status, or will sporadically pop back on to check the status.

(this is not dissimilar to grid computing or mpi).

Both points should be achieved using a well-known concept of correlationId : B possesses a unique id (UUID in my case), which it sends to A in headers, which, in turn, uses it as Reply-To topic to send status updates to. Which means it has to create topics on the fly, they can't be predetermined.

I have auto.create.topics.enable switched on, and it indeed creates topics dynamically, but existing consumers are not aware of them and require to be restarted [to fetch topic metadata i suppose, if i understood the docs right]. I also checked consumer's metadata.max.age.ms setting, but it doesn't help it seems, even if i set it to a very low value.

As far as i've read, this is yet unanswered, ie: kafka filtering/Dynamic topic creation , kafka consumer to dynamically detect topics added , Can a Kafka producer create topics and partitions? or answered unsatisfactory.

As there're hundreds of A s and thousands of B s, i can't possibly use shared topics or anything like it, lest i overload my network. I can use Kafka's AdminTools , or whatever it's called, to pre-create topics, but i find it somehow silly (even though i saw real-life examples of people using it to talk to Zookeeper and Kafka infrastructure itself).

So the question is, is there a way to dynamically create Kafka topics in a way that makes both consumer and producer aware of it without being restarted or anything? And, in the worst case, will AdminTools really help it and on which side must i use it - A or B ?

Kafka 0.11, Java 8

UPDATE Creating topics with AdminClient doesn't help for whatever reason, consumers still throw LEADER_NOT_AVAILABLE when i try to subscribe.

Creating an unbounded number of topics is not recommended. Id advise to redesign your topology/system.

Ive thought of making dynamic topics myself but then realized that eventually zookeeper will fail as it will run out of memory due to stale topics (imagine a year from now on how many topics could be created). Maybe this could work if you make sure you have some upper bound on topics ever created. Overall an administrative headache.

If you look up using Kafka with request response you will find others also say it is awkward to do so ( Does Kafka support request response messaging ).

Ok, so i'd answer my own question.

  1. Creating topics with AdminClient works only if performed before corresponding consumers are created.
  2. Changed the topology i have, taking into account 1) and introducing exchange of correlation ids in message headers (same as in JMS). I also had to implement certain topology management methodologies, grouping B s into containers.

It should be noted that, as many people have said, this only works when B s are in single-consumer groups and listen to topics with 1 partition.

To get some idea of the work i'm into, you might have a look at the middleware framework i've been working on https://github.com/ikonkere/magic .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM