简体   繁体   English

如何进行 Kafka Streams Left Join,在固定时间段后返回没有相应 RHS 的 LHS 消息?

[英]How to do a Kafka Streams Left Join that returns LHS messages that have no corresponding RHS after a fixed period?

I'm new to Kafka Streams.我是 Kafka Streams 的新手。 I've just put together a left join between Stream A and Stream B. It happens in my setup that for every A there is a B, which arrives a few millis after A but in real life there may be missing B's, or B's that arrive late (after say 250ms).我刚刚在流 A 和流 B 之间建立了一个左连接。在我的设置中,每个 A 都有一个 B,它在 A 之后几毫秒到达,但在现实生活中可能缺少 B,或者 B迟到(说 250 毫秒后)。 I want to be able to find these (missing and late B's).我希望能够找到这些(丢失的和迟到的 B)。 I thought it would be easy - just do a left join between A and B, specify the window, and job done.我认为这很容易 - 只需在 A 和 B 之间进行左连接,指定窗口,然后完成工作。 But I found to my surprise that I get 2 rows in the left join stream output.但令我惊讶的是,我在左连接流输出中得到了 2 行。 Thinking about it, this makes sense - when A arrives, there is no B and a join row that looks like A-[null] is generated.想想看,这是有道理的 - 当 A 到达时,没有 B 并且生成了一个看起来像A-[null]的连接行。 A few milliseconds later, B arrives, and then AB is generated.几毫秒后,B 到达,然后生成AB

What I want is to have those A messages that do not have a corresponding B after say 100ms - B could be late;我想要的是让那些 A 消息在说 100 毫秒后没有相应的 B - B 可能会迟到; might never arrive;可能永远不会到达; but it did not arrive within 100ms of A. Is there a standard pattern / idiomatic way to do this?但它没有在 A 的 100 毫秒内到达。是否有标准模式/惯用方式来做到这一点? I am thinking at the moment that maybe I would have to have a consumer that receives the A and then fires a message after a set time (although I'm not exactly sure how that would be done without some clunky synchronous code) and then I would have to join between that (call it Ax) and B.我现在在想,也许我必须让消费者接收 A,然后在设定的时间后触发一条消息(尽管我不确定如果没有一些笨重的同步代码会如何完成),然后我必须在那个(称之为 Ax)和 B 之间加入。

This is probably quite a common requirement, but it doesn't seem as easy as I first thought....any thoughts/pointers/tips would be much appreciated.这可能是一个很常见的要求,但它似乎并不像我最初想象的那么容易......任何想法/指针/提示将不胜感激。 Thanks.谢谢。

OK I have something that seems to work.好的,我有一些似乎有效的东西。 All I need to do is, after the left join (which of course has a window), do a .groupByKey().count() and after that I can eg send stuff (using filter() and branch() I think, although I haven't done it yet) with a count < 2 to one stream ("missing"), and the others to another "good" stream eg for analysis/calculation of metrics etc.我需要做的就是,在左连接(当然有一个窗口)之后,做一个.groupByKey().count()然后我可以发送东西(我认为使用filter()branch() ,虽然我还没有这样做)计数 < 2 到一个流(“缺失”),而其他的到另一个“好”流,例如用于分析/计算指标等。

I tried using .windowedBy(TimeWindows.of(ofMillis(250)).grace(ofMillis(10))) and .suppress(Suppressed.untilWindowCloses(unbounded()));我尝试使用.windowedBy(TimeWindows.of(ofMillis(250)).grace(ofMillis(10))).suppress(Suppressed.untilWindowCloses(unbounded())); but got nowhere with it, so it's just as well that a groupBy with count is all that is needed by the looks of things.但它无处可去,所以同样好,一个带有计数的 groupBy 是事物外观所需要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM