[英]Apache Flink join different DataStreams on specific key
I have two DataStreams
, the first one called DataStream<String> source
which receive records from a message broker, and the second one is a SingleOutputOperator<Event> events
, which is the result of mapping the source into Event.class
.我有两个DataStreams
,第一个称为DataStream<String> source
,它从消息代理接收记录,第二个是SingleOutputOperator<Event> events
,这是将源映射到Event.class
的结果。
I have a uses cases that needs to use SingleOutputOperator<Event> events
and other that uses DataStream<String> source
.我有一个需要使用SingleOutputOperator<Event> events
和其他使用DataStream<String> source
的用例。 In one of the use cases that use DataStream<String> source
, I need to join the SingleOutputOperator<String> result
after apply some filters and to avoid to map the source
again into Event.class
as I already have that operation done and that Stream
, I need to search each record into the SingleOutputOperator<String> result
into the SingleOutputOperator<Event> events
and the apply another map to export a SingleOutputOperator<EventOutDto> out
.在使用DataStream<String> source
的用例之一中,我需要在应用一些过滤器后加入SingleOutputOperator<String> result
,并避免将 map source
再次放入Event.class
因为我已经完成了该操作并且Stream
,我需要将每条记录搜索到SingleOutputOperator<String> result
到SingleOutputOperator<Event> events
,并应用另一个 map 来导出SingleOutputOperator<EventOutDto> out
。
This is the idea as example:这是作为示例的想法:
DataStream<String> source = env.readFrom(source);
SingleOutputOperator<Event> events = source.map(s -> mapper.readValue(s, Event.class));
public void filterAndJoin(DataStream<String> source, SingleOutputOperator<Event> events){
SingleOutputOperator<String> filtered = source.filter(s -> new FilterFunction());
SingleOutputOperator<EventOutDto> result = (this will be the result of search each record
based on id in the filtered stream into the events stream where the id must match and return the event if found)
.map(event -> new EventOutDto(event)).addSink(new RichSinkFunction());
}
I have this code:我有这个代码:
filtered.join(events)
.where(k -> {
JsonNode tree = mapper.readTree(k);
String id = "";
if (tree.get("Id") != null) {
id = tree.get("Id").asText();
}
return id;
})
.equalTo(e -> {
return e.Id;
})
.window(TumblingEventTimeWindows.of(Time.seconds(1)))
.apply(new JoinFunction<String, Event, BehSingleEventTriggerDTO>() {
@Override
public EventOutDto join(String s, Event event) throws Exception {
return new EventOutDto(event);
}
})
.addSink(new SinkFunction());
In the above code all works fine, the ids
are the same, so basically the where(id).equalTo(id)
should work, but the process never reaches the apply
function.在上面的代码中一切正常, ids
是相同的,所以基本上where(id).equalTo(id)
应该可以工作,但是这个过程永远不会到达apply
function。
Observation: Watermark
are assigned with the same timestamp观察: Watermark
被分配了相同的时间戳
Questions:问题:
I solved the join by doing this:我通过这样做解决了加入问题:
SingleOutputStreamOperator<ObjectDTO> triggers = candidates
.keyBy(new KeySelector())
.intervalJoin(keyedStream.keyBy(e -> e.Id))
.between(Time.milliseconds(-2), Time.milliseconds(1))
.process(new new ProcessFunctionOne())
.keyBy(k -> k.otherId)
.process(new ProcessFunctionTwo());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.