简体   繁体   English

flink SourceFunction <>在StreamExecutionEnvironment.addSource()中被替换了吗?

[英]flink SourceFunction<> is being replaced in StreamExecutionEnvironment.addSource()?

I ran into this problem when I was trying to create a custom source of event. 当我尝试创建事件的自定义源时遇到了这个问题。 Which contains a queue that allow my other process to add items into it. 其中包含一个队列,该队列允许我的其他进程向其中添加项目。 Then expect my CEP pattern to print some debug messages when there is a match. 然后期望我的CEP模式在存在匹配项时打印一些调试消息。

But there is no match no matter what I add to the queue. 但是无论我添加到队列中都没有匹配项。 Then I notice that the queue inside mySource.run() is always empty. 然后我注意到mySource.run()中的队列始终为空。 Which means the queue I used to create the mySource instance is not the same as the one inside StreamExecutionEnvironment . 这意味着我用来创建mySource实例的队列与StreamExecutionEnvironment中的队列StreamExecutionEnvironment If I change the queue to static, force all instances to share the same queue, everything works as expected. 如果我将队列更改为静态队列,则强制所有实例共享同一队列,那么一切都会按预期进行。

DummySource.java DummySource.java

    public class DummySource implements SourceFunction<String> {

    private static final long serialVersionUID = 3978123556403297086L;
//  private static Queue<String> queue = new LinkedBlockingQueue<String>();
    private Queue<String> queue;
    private boolean cancel = false;

    public void setQueue(Queue<String> q){
        queue = q;
    }   

    @Override
    public void run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<String> ctx)
            throws Exception {
        System.out.println("run");
        synchronized (queue) {          
            while (!cancel) {
                if (queue.peek() != null) {
                    String e = queue.poll();
                    if (e.equals("exit")) {
                        cancel();
                    }
                    System.out.println("collect "+e);
                    ctx.collectWithTimestamp(e, System.currentTimeMillis());
                }
            }
        }
    }

    @Override
    public void cancel() {
        System.out.println("canceled");
        cancel = true;
    }
}

So I dig into the source code of StreamExecutionEnvironment . 因此,我研究了StreamExecutionEnvironment的源代码。 Inside the addSource() method. 在addSource()方法内部。 There is a clean() method which looks like it replaces the instance to a new one. 有一个clean()方法看起来像它将实例替换为新实例。

Returns a "closure-cleaned" version of the given function. 返回给定函数的“关闭清除”版本。

Why is that? 这是为什么? and Why it needs to be serialize? 以及为什么需要序列化? I've also try to turn off the clean closure using getConfig(). 我也尝试使用getConfig()关闭干净的关闭。 The result is still the same. 结果仍然相同。 My queue instance is not the same one which env is using. 我的队列实例与env正在使用的实例不同。

How do I solve this problem? 我该如何解决这个问题?

The clean() method used on functions in Flink is mainly to ensure the Function (like SourceFunction, MapFunction) serialisable. Flink中的函数上使用的clean()方法主要是为了确保Function (例如SourceFunction,MapFunction)可序列化。 Flink will serialise those functions and distribute them onto task nodes to execute them. Flink将序列化这些功能并将其分配到任务节点上以执行它们。

For simple variables in your Flink main code, like int, you can simply reference them in your function. 对于Flink主代码中的简单变量,例如int,您可以在函数中简单地引用它们。 But for the large or not-serialisable ones, better using broadcast and rich source function. 但是对于大型或不可序列化的服务器,最好使用广播和丰富的源功能。 Please refer to https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables 请参阅https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM