Combining Java and Python in Apache Beam pipeline

Question

Is it possible to combine Java and Python transforms in Apache Beam?

Here is the use case (ie dream plan): the raw input data has very high rate, and so some initial aggregation is needed in a reasonably fast language (eg Java). The aggregated values are then given to a few transforms (implemented in Python) and then passed through a stack of machine learning models (implemented in Python) to produce some predictions, which will then be utilized again in some Java code.

Is it possible in Apache Beam?

Thank you very much for your help!

Answer 1

It should be possible. You need an ExternalTransform and an expansion service.

See here a test pipeline that does this:

counts = (lines
          | 'split' >> (beam.ParDo(WordExtractingDoFn())
                        .with_output_types(bytes))
          | 'count' >> beam.ExternalTransform(
              'beam:transforms:xlang:count', None, EXPANSION_SERVICE_ADDR))

Here beam:transforms:xlang:count is a URN of a transform that should be known to the expansion service. This example uses a custom expansion service that expands that URN into a Java PTransform , you can build your own along the same lines.

You can see how this example is started here .

Combining Java and Python in Apache Beam pipeline

Question

1 answers

solution1
0 2019-08-06 21:49:45

Combining Java and Python in Apache Beam pipeline

Question

1 answers

solution1 0 2019-08-06 21:49:45

solution1
0 2019-08-06 21:49:45