[英]Combining Java and Python in Apache Beam pipeline
Is it possible to combine Java and Python transforms in Apache Beam? 是否可以在Apache Beam中结合Java和Python转换?
Here is the use case (ie dream plan): the raw input data has very high rate, and so some initial aggregation is needed in a reasonably fast language (eg Java). 这是用例(即梦想计划):原始输入数据具有很高的速率,因此需要使用相当快速的语言(例如Java)进行一些初始聚合。 The aggregated values are then given to a few transforms (implemented in Python) and then passed through a stack of machine learning models (implemented in Python) to produce some predictions, which will then be utilized again in some Java code.
然后,将聚合的值提供给一些转换(在Python中实现),然后传递到一堆机器学习模型(在Python中实现)以产生一些预测,然后将这些预测再次用于某些Java代码中。
Is it possible in Apache Beam? 在Apache Beam中可以吗?
Thank you very much for your help! 非常感谢您的帮助!
It should be possible. 应该有可能。 You need an
ExternalTransform
and an expansion service. 您需要一个
ExternalTransform
和一个扩展服务。
See here a test pipeline that does this: 在这里查看执行此操作的测试管道:
counts = (lines
| 'split' >> (beam.ParDo(WordExtractingDoFn())
.with_output_types(bytes))
| 'count' >> beam.ExternalTransform(
'beam:transforms:xlang:count', None, EXPANSION_SERVICE_ADDR))
Here beam:transforms:xlang:count
is a URN of a transform that should be known to the expansion service. 在这里,
beam:transforms:xlang:count
是扩展服务应该知道的转换的URN。 This example uses a custom expansion service that expands that URN into a Java PTransform
, you can build your own along the same lines. 本示例使用了一个自定义扩展服务 ,该服务将URN扩展为Java
PTransform
,您可以按照相同的方式构建自己的服务。
You can see how this example is started here . 您可以在此处查看如何开始该示例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.