[英]Flink DataStream - how to start a source from an input element?
Say I have a Flink SourceFunction<String>
called RequestsSource
. 假设我有一个名为
RequestsSource
的Flink SourceFunction<String>
。
On each request coming in from that source, I would like to subscribe to an external data source (for the purposes of an example, it could start a separate thread and start producing data on that thread). 对于来自该源的每个请求,我想订阅一个外部数据源(出于示例的目的,它可以启动一个单独的线程并开始在该线程上生成数据)。
The output data could be joined on a single DataStream
. 输出数据可以连接到单个
DataStream
。 For example 例如
Input Requests: A, B Data produced: A1 B1 A2 A3 B2 ...
... and so on, with new elements being added to the DataStream forever. ...等等,新元素将永久添加到DataStream中。
How do I write a Flink Operator that can do this? 如何编写可以做到这一点的Flink运算符? Can I use eg
FlatMapFunction
? 我可以使用
FlatMapFunction
吗?
you'd typically want to use an AsyncFunction , which (asynchronously) can take one input element, call some external service, and emit a collection of results. 您通常希望使用AsyncFunction ,它(异步)可以采用一个输入元素,调用某些外部服务,并发出结果集合。
See also Apache Flink Training - Async IO . 另请参阅Apache Flink培训-异步IO 。
-- Ken -肯
It sounds you are asking about an operator that can emit one or more boundless streams of data based on a connection to an external service, after receiving subscription events. 听起来您正在询问一个运营商,该运营商在接收到订阅事件之后可以根据与外部服务的连接来发出一个或多个无限的数据流。 The only clean way I can see to do this is to do all the work in the SourceFunction, or in a custom Operator.
我看到的唯一干净的方法是在SourceFunction或自定义Operator中完成所有工作。
I don't believe async i/o can emit an unbounded stream of results from a single input event. 我不认为异步I / O可以从单个输入事件中发出无限的结果流。 A ProcessFunction can do that, but only via its onTimer method.
ProcessFunction可以执行此操作,但只能通过其onTimer方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.