Say I have a Flink SourceFunction<String>
called RequestsSource
.
On each request coming in from that source, I would like to subscribe to an external data source (for the purposes of an example, it could start a separate thread and start producing data on that thread).
The output data could be joined on a single DataStream
. For example
Input Requests: A, B Data produced: A1 B1 A2 A3 B2 ...
... and so on, with new elements being added to the DataStream forever.
How do I write a Flink Operator that can do this? Can I use eg FlatMapFunction
?
you'd typically want to use an AsyncFunction , which (asynchronously) can take one input element, call some external service, and emit a collection of results.
See also Apache Flink Training - Async IO .
-- Ken
It sounds you are asking about an operator that can emit one or more boundless streams of data based on a connection to an external service, after receiving subscription events. The only clean way I can see to do this is to do all the work in the SourceFunction, or in a custom Operator.
I don't believe async i/o can emit an unbounded stream of results from a single input event. A ProcessFunction can do that, but only via its onTimer method.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.