简体   繁体   English

如何在Apache Flink上运行用Python编写的光束管道

[英]How to run a beam pipeline written in python on apache flink

I have used python sdk to write my beam pipelines. 我已经使用python sdk编写了我的光束管道。 I am using celery as a wrapper over direct runner. 我将芹菜用作直接赛跑者的包裹物。 I want to use flink runner to parallelise my load. 我想使用flink运行程序并行化我的负载。

As per the documentation, you need to give your job as a jar file for flink runner. 根据文档,您需要将工作作为flink运行程序的jar文件提供。

Can you point me to any resources where I can use both apache beam python sdk and apache flink? 您能指出我可以同时使用apache beam python sdk和apache flink的任何资源吗? Any samples? 有样品吗?

As for now ( Apache Beam 2.2.0 ) there is no support for Apache Flink Runner for Apache Beam Python SDK. 到目前为止( Apache Beam 2.2.0 )不支持Apache Beam Python SDK的Apache Flink Runner。 When you try to use FlinkRunner in your Python pipeline you will get ValueError : 当您尝试在Python管道中使用FlinkRunner ,将出现ValueError

ValueError: Unexpected pipeline runner: FlinkRunner. ValueError:意外的管道运行器:FlinkRunner。 Valid values are DirectRunner, EagerRunner, DataflowRunner, TestDataflowRunner or the fully qualified name of a PipelineRunner subclass. 有效值为DirectRunner,EagerRunner,DataflowRunner,TestDataflowRunner或PipelineRunner子类的完全限定名称。

You can see this in source code, here: https://github.com/apache/beam/blob/d11b9e9560131f55b418a13a7d10401c2135fb33/sdks/python/apache_beam/runners/runner.py#L62 您可以在以下源代码中看到它: https : //github.com/apache/beam/blob/d11b9e9560131f55b418a13a7d10401c2135fb33/sdks/python/apache_beam/runners/runner.py#L62

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM