简体   繁体   English

Apache Flink:如何实现SourceFunction?

[英]Apache Flink: How to implement a SourceFunction?

I have implemented a SourceFunction that fetches data (a String) from an URL. 我已经实现了一个从URL中获取数据(String)的SourceFunction Then I am doing keyBy() of that data and apply a window of 10 minutes. 然后我正在执行该数据的keyBy()并应用一个10分钟的窗口。 Now the SourceFunction is called only once and the windows operate on the data for 10 minutes. 现在SourceFunction只被调用一次,窗口对数据进行10分钟的操作。 How can I get data continously from the SourceFunction ? 如何从SourceFunction连续获取数据?

DataStream<String> = env.addSource(MySource())   // This runs only once
                        .keyBy(some keyby function)
                        .window(for 10 minutes)  // This runs for 10 minutes for the data obtained once by Source function
                        .process(some process function)

I want to run the SourceFunction repeatedly in a certain time interval and let window work on the continously fetched data. 我想在一定的时间间隔内重复运行SourceFunction ,让窗口处理连续获取的数据。

Your SourceFunction s run() method should be a loop which does a sleep (or whatever other scheduling mechanism) to do the work. 您的SourceFunction的run()方法应该是一个循环,它执行睡眠(或任何其他调度机制)来完成工作。

A common pattern is to use some sort of atomic boolean that you set to true when run is first called, and gets set to false when cancel is called. 一种常见的模式是使用某种原子布尔值,在第一次调用run时将其设置为true,并在调用cancel时将其设置为false。

So you have something like this in your run method: 所以你在run方法中有这样的东西:

while (running) {
   // fetch some data, can be async
   ctx.collect(data);
   Thread.sleep(period);
}

You can do that part however you see fit but the important thing is that you do not exit the run method of your SourceFunction until you are actually done or you have been cancelled. 你可以按照自己认为合适的方式执行该部分,但重要的是,在实际完成或取消之前,不要退出SourceFunctionrun方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM