简体繁体中英

Apache Beam: What is the difference between DoFn and SimpleFunction?

原文 2018-05-25 09:22:23 5 1 java/ apache-beam

While reading about processing streaming elements in apache beam using Java, I came across DoFn<InputT, OutputT> and then across SimpleFunction<InputT, OutputT> .

Both of these look similar to me and I find it difficult to understand the difference.

Can someone explain the difference in layman terms?

1 answers

Conceptually you can think of SimpleFunction is a simple case of DoFn :

SimpleFunction<InputT, OutputT> :
- simple input to output mapping function;
- single input produces single output;
- statically typed, you have to @Override the apply() method;
- doesn't depend on computation context;
- can't use Beam state APIs;
- example use case: MapElements.via(simpleFunction) to convert/modify elements one by one, producing one output for each element;
DoFn<InputT, OutputT> :
- executed with ParDo ;
- exposed to the context (timestamp, window pane, etc);
- can consume side inputs;
- can produce multiple outputs or no outputs at all;
- can produce side outputs;
- can use Beam's persistent state APIs;
- dynamically typed;
- example use case: read objects from a stream, filter, accumulate them, perform aggregations, convert them, and dispatch to different outputs;

You can find more specific examples and use cases for ParDos in the dev guide .

This part mentions the MapElements , which is the use case for SimpleFunctions

What is the difference between DoFn.Setup and DoFn.StartBundle?

Thread Synchronization for DoFn in Apache Beam

How to apply a DoFn PTransform to a PCollectionTuple in Apache Beam

Apache Beam DirectRunner enable multi threaded processing of different ParDo/DoFn

What's the difference between Python and java when dealing with apache beam framework?

Apache Beam , mock external Clients initialized in @Setup Lifecycle method of DoFn

Expected DoFn to be FunctionSpec with URN beam:dofn:javasdk:0.1, but URN was

Apache Beam / Dataflow - Delays between steps in pipeline

Apache Beam - what are the limits of Deduplication function

Apache Storm - What is the difference between the Scheme and the MultiScheme interface?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the difference between DoFn.Setup and DoFn.StartBundle? Thread Synchronization for DoFn in Apache Beam How to apply a DoFn PTransform to a PCollectionTuple in Apache Beam Apache Beam DirectRunner enable multi threaded processing of different ParDo/DoFn What's the difference between Python and java when dealing with apache beam framework? Apache Beam , mock external Clients initialized in @Setup Lifecycle method of DoFn Expected DoFn to be FunctionSpec with URN beam:dofn:javasdk:0.1, but URN was Apache Beam / Dataflow - Delays between steps in pipeline Apache Beam - what are the limits of Deduplication function Apache Storm - What is the difference between the Scheme and the MultiScheme interface?

Related Tags

Apache Beam: What is the difference between DoFn and SimpleFunction?

Question

1 answers

solution1 10 ACCPTED 2018-05-31 19:41:57

solution1
10 ACCPTED 2018-05-31 19:41:57