Both DoFn
and PTransform
is a means to define operation for PCollection
. How do we know which to use when?
A simple way to understand it is by analogy with map(f)
for lists:
map
applies a function to each element of a list, returning a new list of the results. You might call it a computational pattern. f
is the logic applied to each element. Now, switching to talk about Beam specifics, I think you are asking about ParDo.of(fn)
, which is a PTransform
.
PTransform
is an operation that takes PCollections
as input and yields PCollections
as output. Beam has just five primitive types of PTransform
, encapsulating embarrassingly parallel computational patterns. ParDo
is the computational pattern of per-element computation. It has some variations, but you don't need to worry about that for this question. DoFn
, here I called it fn
, is the logic that is applied to each element. It may also help to think of the fact that you write a DoFn
to say what to do on each element, and the Beam runner provides the ParDo
to apply your logic.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.