简体   繁体   English

在Apache Beam管道中结合Java和Python

[英]Combining Java and Python in Apache Beam pipeline

Is it possible to combine Java and Python transforms in Apache Beam? 是否可以在Apache Beam中结合Java和Python转换?

Here is the use case (ie dream plan): the raw input data has very high rate, and so some initial aggregation is needed in a reasonably fast language (eg Java). 这是用例(即梦想计划):原始输入数据具有很高的速率,因此需要使用相当快速的语言(例如Java)进行一些初始聚合。 The aggregated values are then given to a few transforms (implemented in Python) and then passed through a stack of machine learning models (implemented in Python) to produce some predictions, which will then be utilized again in some Java code. 然后,将聚合的值提供给一些转换(在Python中实现),然后传递到一堆机器学习模型(在Python中实现)以产生一些预测,然后将这些预测再次用于某些Java代码中。

Is it possible in Apache Beam? 在Apache Beam中可以吗?

Thank you very much for your help! 非常感谢您的帮助!

It should be possible. 应该有可能。 You need an ExternalTransform and an expansion service. 您需要一个ExternalTransform和一个扩展服务。

See here a test pipeline that does this: 在这里查看执行此操作的测试管道:

counts = (lines
          | 'split' >> (beam.ParDo(WordExtractingDoFn())
                        .with_output_types(bytes))
          | 'count' >> beam.ExternalTransform(
              'beam:transforms:xlang:count', None, EXPANSION_SERVICE_ADDR))

Here beam:transforms:xlang:count is a URN of a transform that should be known to the expansion service. 在这里, beam:transforms:xlang:count是扩展服务应该知道的转换的URN。 This example uses a custom expansion service that expands that URN into a Java PTransform , you can build your own along the same lines. 本示例使用了一个自定义扩展服务 ,该服务将URN扩展为Java PTransform ,您可以按照相同的方式构建自己的服务。

You can see how this example is started here . 您可以在此处查看如何开始该示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM