简体   繁体   中英

Flink DataStream - how to start a source from an input element?

Say I have a Flink SourceFunction<String> called RequestsSource .

On each request coming in from that source, I would like to subscribe to an external data source (for the purposes of an example, it could start a separate thread and start producing data on that thread).

The output data could be joined on a single DataStream . For example

Input Requests: A, B
Data produced:
 A1
 B1
 A2
 A3
 B2
 ...

... and so on, with new elements being added to the DataStream forever.

How do I write a Flink Operator that can do this? Can I use eg FlatMapFunction ?

you'd typically want to use an AsyncFunction , which (asynchronously) can take one input element, call some external service, and emit a collection of results.

See also Apache Flink Training - Async IO .

-- Ken

It sounds you are asking about an operator that can emit one or more boundless streams of data based on a connection to an external service, after receiving subscription events. The only clean way I can see to do this is to do all the work in the SourceFunction, or in a custom Operator.

I don't believe async i/o can emit an unbounded stream of results from a single input event. A ProcessFunction can do that, but only via its onTimer method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM