简体   繁体   English

在 apache flink 中设置变量

[英]setting variables in apache flink

I'm asking this question because I'm having trouble setting variables in apache flink.我问这个问题是因为我在 apache flink 中设置变量时遇到问题。 i would like to use a stream to fetch data with which i will initialize the variables i need for the second stream. The problem is that the streams execute in parallel, which results in a missing value when initializing the second stream. sample code:我想使用 stream 来获取数据,我将用这些数据初始化第二个 stream 所需的变量。问题是流是并行执行的,这会导致在初始化第二个 stream 时缺少值。示例代码:

KafkaSource<Object> mainSource1 = KafkaSource.<Object>builder()
      .setBootstrapServers(...)
      .setTopicPattern(Pattern.compile(...))
      .setGroupId(...)
      .setStartingOffsets(OffsetsInitializer.earliest())
      .setDeserializer(new ObjectDeserializer())
      .build();

DataStream<Market> mainStream1 = env.fromSource(mainSource, WatermarkStrategy.forMonotonousTimestamps(), "mainSource");


// fetching data from the stream and setting variables


Map<TopicPartition, Long> endOffset = new HashMap<>();
endOffset.put(new TopicPartition("topicName", 0), offsetFromMainStream1);



KafkaSource<Object> mainSource2 = KafkaSource.<Object>builder()
      .setBootstrapServers(...)
      .setTopicPattern(Pattern.compile(...))
      .setGroupId(...)
      .setStartingOffsets(OffsetsInitializer.earliest())
      .setBounded(OffsetsInitializer.offsets(endOffset))
      .setDeserializer(new ObjectDeserializer())
      .build();

DataStream<Market> mainStream2 = env.fromSource(mainSource, WatermarkStrategy.forMonotonousTimestamps(), "mainSource");

// further stream operations


I would like to call the first stream from which I will fetch the data and set it locally then I can use it in operations on the second stream我想调用第一个 stream,我将从中获取数据并在本地设置,然后我可以在第二个 stream 的操作中使用它

You want to use one Stream's data to control another Stream's behavior.您想要使用一个 Stream 的数据来控制另一个 Stream 的行为。 The best way is to use the Broadcast state pattern.最好的方法是使用 Broadcast state 模式。

This involves creating a BroadcastStream from mainStream1 , and then connecting mainStream2 to mainStream1 .这涉及从mainStream1创建BroadcastStream ,然后将mainStream2连接到mainStream1 Now mainStream2 can access the data from mainStream1 .现在mainStream2可以访问mainStream1的数据。

Here is a high level example based on your code.这是基于您的代码的高级示例。 I am assuming that the key is String.我假设键是字符串。

// Broadcast Stream
MapStateDescriptor<String, Market> stateDescriptor = new MapStateDescriptor<>(
            "RulesBroadcastState",
            BasicTypeInfo.STRING_TYPE_INFO,
            TypeInformation.of(new TypeHint<Market>() {}));
        
// broadcast the rules and create the broadcast state
BroadcastStream<Market> mainStream1BroadcastStream = mainStream1.keyBy(// key by Id).
                        .broadcast(stateDescriptor);

DataStream<Market> yourOutput = mainStream2
                 .connect(mainStream1BroadcastStream)
                 .process(            
                    new KeyedBroadcastProcessFunction<>() {
                         // You can access mainStream1 output and mainStream2 data here.
                     }
                 );

This concept is explained in detail here.这里详细解释了这个概念。 The code is also a modified version shown here - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/broadcast_state/#the-broadcast-state-pattern该代码也是此处显示的修改版本 - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/broadcast_state/#the-broadcast-state-pattern

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM