简体   繁体   中英

How to do a Kafka Streams Left Join that returns LHS messages that have no corresponding RHS after a fixed period?

I'm new to Kafka Streams. I've just put together a left join between Stream A and Stream B. It happens in my setup that for every A there is a B, which arrives a few millis after A but in real life there may be missing B's, or B's that arrive late (after say 250ms). I want to be able to find these (missing and late B's). I thought it would be easy - just do a left join between A and B, specify the window, and job done. But I found to my surprise that I get 2 rows in the left join stream output. Thinking about it, this makes sense - when A arrives, there is no B and a join row that looks like A-[null] is generated. A few milliseconds later, B arrives, and then AB is generated.

What I want is to have those A messages that do not have a corresponding B after say 100ms - B could be late; might never arrive; but it did not arrive within 100ms of A. Is there a standard pattern / idiomatic way to do this? I am thinking at the moment that maybe I would have to have a consumer that receives the A and then fires a message after a set time (although I'm not exactly sure how that would be done without some clunky synchronous code) and then I would have to join between that (call it Ax) and B.

This is probably quite a common requirement, but it doesn't seem as easy as I first thought....any thoughts/pointers/tips would be much appreciated. Thanks.

OK I have something that seems to work. All I need to do is, after the left join (which of course has a window), do a .groupByKey().count() and after that I can eg send stuff (using filter() and branch() I think, although I haven't done it yet) with a count < 2 to one stream ("missing"), and the others to another "good" stream eg for analysis/calculation of metrics etc.

I tried using .windowedBy(TimeWindows.of(ofMillis(250)).grace(ofMillis(10))) and .suppress(Suppressed.untilWindowCloses(unbounded())); but got nowhere with it, so it's just as well that a groupBy with count is all that is needed by the looks of things.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM