简体   繁体   中英

How to join multiple Kafka topics?

So I have...

  • 1st topic that has general application logs (log4j). Stores things like HTTP API requests/responses and warnings, exceptions etc... There can be multiple logs associated to one logical business request. (These logs happen within seconds of each other)
  • 2nd topic contains commands from the above business request which other services take action on. (The commands also happen within seconds of each other, but maybe couple minutes from the original request)
  • 3rd topic contains events generated from actions of those other services. (Most events complete within seconds, but some can take up to 3-5 days to be received)

So a single logical business request can have multiple logs, commands and events associated to it by a uuid which the microservices pass to each other.

So what are some of the technologies/patterns that can be used to read the 3 topics and join them all together as a single json document and then dump them to lets say Elasticsearch?

Streaming?

You can use Kafka Streams, or KSQL, to achieve this. Which one depends on your preference/experience with Java, and also the specifics of the joins you want to do.

KSQL is the SQL streaming engine for Apache Kafka, and with SQL alone you can declare stream processing applications against Kafka topics. You can filter, enrich, and aggregate topics. Currently only stream-table joins are supported. You can see an example in this article here

The Kafka Streams API is part of Apache Kafka, and a Java library that you can use to do stream processing of data in Apache Kafka. It is actually what KSQL is built on, and supports greater flexibility of processing, including stream-stream joins .

You can use KSQL to join the streams.

  1. There are 2 constructs in KSQL Table/Stream.
  2. Currently, the Join is supported for a Stream & a table. So you need to identify the which is a good fit for what?
  3. You don't need windowing for joins.

Benefits of using KSQL.

  1. KSQL is easy to set up.
  2. KSQL is SQL language which helps you to query your data quickly.

Drawback.

  1. It's not production ready but in April-2018 the release is coming up.
  2. Its little buggy right now but certainly will improve in a few months.

Please have a look.

https://github.com/confluentinc/ksql

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM