[英]Apache Flink: How to implement a SourceFunction?
I have implemented a SourceFunction
that fetches data (a String) from an URL. 我已经实现了一个从URL中获取数据(String)的SourceFunction
。 Then I am doing keyBy()
of that data and apply a window of 10 minutes. 然后我正在执行该数据的keyBy()
并应用一个10分钟的窗口。 Now the SourceFunction
is called only once and the windows operate on the data for 10 minutes. 现在SourceFunction
只被调用一次,窗口对数据进行10分钟的操作。 How can I get data continously from the SourceFunction
? 如何从SourceFunction
连续获取数据?
DataStream<String> = env.addSource(MySource()) // This runs only once
.keyBy(some keyby function)
.window(for 10 minutes) // This runs for 10 minutes for the data obtained once by Source function
.process(some process function)
I want to run the SourceFunction
repeatedly in a certain time interval and let window work on the continously fetched data. 我想在一定的时间间隔内重复运行SourceFunction
,让窗口处理连续获取的数据。
Your SourceFunction
s run() method should be a loop which does a sleep (or whatever other scheduling mechanism) to do the work. 您的SourceFunction
的run()方法应该是一个循环,它执行睡眠(或任何其他调度机制)来完成工作。
A common pattern is to use some sort of atomic boolean that you set to true when run
is first called, and gets set to false when cancel
is called. 一种常见的模式是使用某种原子布尔值,在第一次调用run
时将其设置为true,并在调用cancel
时将其设置为false。
So you have something like this in your run
method: 所以你在run
方法中有这样的东西:
while (running) {
// fetch some data, can be async
ctx.collect(data);
Thread.sleep(period);
}
You can do that part however you see fit but the important thing is that you do not exit the run
method of your SourceFunction
until you are actually done or you have been cancelled. 你可以按照自己认为合适的方式执行该部分,但重要的是,在实际完成或取消之前,不要退出SourceFunction
的run
方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.