简体   繁体   English

Apache Flink 在特定键上加入不同的 DataStream

[英]Apache Flink join different DataStreams on specific key

I have two DataStreams , the first one called DataStream<String> source which receive records from a message broker, and the second one is a SingleOutputOperator<Event> events , which is the result of mapping the source into Event.class .我有两个DataStreams ,第一个称为DataStream<String> source ,它从消息代理接收记录,第二个是SingleOutputOperator<Event> events ,这是将源映射到Event.class的结果。

I have a uses cases that needs to use SingleOutputOperator<Event> events and other that uses DataStream<String> source .我有一个需要使用SingleOutputOperator<Event> events和其他使用DataStream<String> source的用例。 In one of the use cases that use DataStream<String> source , I need to join the SingleOutputOperator<String> result after apply some filters and to avoid to map the source again into Event.class as I already have that operation done and that Stream , I need to search each record into the SingleOutputOperator<String> result into the SingleOutputOperator<Event> events and the apply another map to export a SingleOutputOperator<EventOutDto> out .在使用DataStream<String> source的用例之一中,我需要在应用一些过滤器后加入SingleOutputOperator<String> result ,并避免将 map source再次放入Event.class因为我已经完成了该操作并且Stream ,我需要将每条记录搜索到SingleOutputOperator<String> resultSingleOutputOperator<Event> events ,并应用另一个 map 来导出SingleOutputOperator<EventOutDto> out

This is the idea as example:这是作为示例的想法:

DataStream<String> source = env.readFrom(source);
SingleOutputOperator<Event> events = source.map(s -> mapper.readValue(s, Event.class));


public void filterAndJoin(DataStream<String> source, SingleOutputOperator<Event> events){
  
  SingleOutputOperator<String> filtered = source.filter(s -> new FilterFunction());
  
  SingleOutputOperator<EventOutDto> result = (this will be the result of search each record 
      based on id in the filtered stream into the events stream where the id must match and return the event if found)
.map(event -> new EventOutDto(event)).addSink(new RichSinkFunction());
}

I have this code:我有这个代码:

filtered.join(events)
                    .where(k -> {
                        JsonNode tree = mapper.readTree(k);
                        String id = "";
                        if (tree.get("Id") != null) {
                            id = tree.get("Id").asText();
                        }
                        return id;
                    })
                    .equalTo(e -> {
                        return e.Id;
                    })
                    .window(TumblingEventTimeWindows.of(Time.seconds(1)))
                    .apply(new JoinFunction<String, Event, BehSingleEventTriggerDTO>() {
                        @Override
                        public EventOutDto join(String s, Event event) throws Exception {
                            return new EventOutDto(event);
                        }
                    })
                    .addSink(new SinkFunction());

In the above code all works fine, the ids are the same, so basically the where(id).equalTo(id) should work, but the process never reaches the apply function.在上面的代码中一切正常, ids是相同的,所以基本上where(id).equalTo(id)应该可以工作,但是这个过程永远不会到达apply function。

Observation: Watermark are assigned with the same timestamp观察: Watermark被分配了相同的时间戳

Questions:问题:

  • Any idea why?知道为什么吗?
  • Am I explained myself fine?我解释自己好吗?

I solved the join by doing this:我通过这样做解决了加入问题:

SingleOutputStreamOperator<ObjectDTO> triggers = candidates
                    .keyBy(new KeySelector())
                    .intervalJoin(keyedStream.keyBy(e -> e.Id))
                    .between(Time.milliseconds(-2), Time.milliseconds(1))
                    .process(new new ProcessFunctionOne())
                    .keyBy(k -> k.otherId)
                    .process(new ProcessFunctionTwo());

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM