I'm new to Kafka Streams. I've just put together a left join between Stream A and Stream B. It happens in my setup that for every A there is a B, which arrives a few millis after A but in real life there may be missing B's, or B's that arrive late (after say 250ms). I want to be able to find these (missing and late B's). I thought it would be easy - just do a left join between A and B, specify the window, and job done. But I found to my surprise that I get 2 rows in the left join stream output. Thinking about it, this makes sense - when A arrives, there is no B and a join row that looks like A-[null]
is generated. A few milliseconds later, B arrives, and then AB
is generated.
What I want is to have those A messages that do not have a corresponding B after say 100ms - B could be late; might never arrive; but it did not arrive within 100ms of A. Is there a standard pattern / idiomatic way to do this? I am thinking at the moment that maybe I would have to have a consumer that receives the A and then fires a message after a set time (although I'm not exactly sure how that would be done without some clunky synchronous code) and then I would have to join between that (call it Ax) and B.
This is probably quite a common requirement, but it doesn't seem as easy as I first thought....any thoughts/pointers/tips would be much appreciated. Thanks.
OK I have something that seems to work. All I need to do is, after the left join (which of course has a window), do a .groupByKey().count()
and after that I can eg send stuff (using filter()
and branch()
I think, although I haven't done it yet) with a count < 2 to one stream ("missing"), and the others to another "good" stream eg for analysis/calculation of metrics etc.
I tried using .windowedBy(TimeWindows.of(ofMillis(250)).grace(ofMillis(10)))
and .suppress(Suppressed.untilWindowCloses(unbounded()));
but got nowhere with it, so it's just as well that a groupBy with count is all that is needed by the looks of things.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.