简体繁体中英

Apache Beam: DoFn vs PTransform

原文 2017-12-08 01:57:19 2 1 google-cloud-dataflow/ apache-beam

Both DoFn and PTransform is a means to define operation for PCollection . How do we know which to use when?

1 answers

A simple way to understand it is by analogy with map(f) for lists:

The higher-order function map applies a function to each element of a list, returning a new list of the results. You might call it a computational pattern.
The function f is the logic applied to each element.

Now, switching to talk about Beam specifics, I think you are asking about ParDo.of(fn) , which is a PTransform .

A PTransform is an operation that takes PCollections as input and yields PCollections as output. Beam has just five primitive types of PTransform , encapsulating embarrassingly parallel computational patterns.
ParDo is the computational pattern of per-element computation. It has some variations, but you don't need to worry about that for this question.
The DoFn , here I called it fn , is the logic that is applied to each element.

It may also help to think of the fact that you write a DoFn to say what to do on each element, and the Beam runner provides the ParDo to apply your logic.

How to apply a DoFn PTransform to a PCollectionTuple in Apache Beam

Add dependency between 2 Dofn in Apache Beam

how to write to GCS with a ParDo and a DoFn in apache beam

Apply not applicable with ParDo and DoFn using Apache Beam

Apply stateful DoFn per key in Apache Beam

Apache Beam: How To Simultaneously Create Many PCollections That Undergo Same PTransform?

IllegalMutationException from Beam PTransform

AttributeError: module 'apache_beam' has no attribute 'DoFn'

How to write a splittable DoFn in python - convert json to ndjson in apache beam

Apache Beam - Unable to infer a Coder on a DoFn with multiple output tags

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to apply a DoFn PTransform to a PCollectionTuple in Apache Beam Add dependency between 2 Dofn in Apache Beam how to write to GCS with a ParDo and a DoFn in apache beam Apply not applicable with ParDo and DoFn using Apache Beam Apply stateful DoFn per key in Apache Beam Apache Beam: How To Simultaneously Create Many PCollections That Undergo Same PTransform? IllegalMutationException from Beam PTransform AttributeError: module 'apache_beam' has no attribute 'DoFn' How to write a splittable DoFn in python - convert json to ndjson in apache beam Apache Beam - Unable to infer a Coder on a DoFn with multiple output tags

Related Tags

Apache Beam: DoFn vs PTransform

Question

1 answers

solution1 15 ACCPTED 2017-12-08 03:48:29

solution1
15 ACCPTED 2017-12-08 03:48:29