简体繁体 English

预计 ETA 在使用 python 的 apache beam GCP 数据流管道中使用管道 I/O 和运行时参数？

[英]Expected ETA to avail Pipeline I/O and runtime parameters in apache beam GCP dataflow pipeline using python?

原文 2020-02-10 08:51:31 8 2 python/ google-cloud-platform/ google-cloud-dataflow/ apache-beam/ apache-beam-io

Just wanted to know do we have more pipeline I/O and runtime parameters available with new version (3.X) of python.只是想知道我们是否有更多的管道 I/O 和运行时参数可用于新版本 (3.X) 的 python。 If I am correct then currently apache beam provide only File-based IOs: textio, avroio, tfrecordio when using python.如果我是正确的，那么当前 apache beam 在使用 python 时只提供基于文件的 IO：textio、avroio、tfrecordio。 But with Java we have more options available like File-based IOs, BigQueryIO, BigtableIO, PubSubIO and SpannerIO.但是对于 Java，我们有更多可用选项，例如基于文件的 IO、BigQueryIO、BigtableIO、PubSubIO 和 SpannerIO。

In my requirement I want to use BigQueryIO in GCP dataflow pipeline using python 3.X, But currently it is not available.在我的要求中，我想在使用 python 3.X 的 GCP 数据流管道中使用 BigQueryIO，但目前它不可用。 Does anyone have some update on ETA when will it be available by apache beam.有没有人有关于 ETA 的一些更新，什么时候可以通过 apache beam 获得。

2 个解决方案

The BigTable Connector for Python 3 is under development for some time now. Python 3 的 BigTable 连接器已经开发了一段时间。 Currently, there is no ETA but you can follow the relevant Pull-Request from the official Apache Beam repository for further updates.目前，没有 ETA，但您可以关注官方 Apache Beam 存储库中的相关Pull-Request以获取进一步更新。

BigQueryIO has been available for quite some time in the Apache Beam Python SDK. BigQueryIO在 Apache Beam Python SDK 中已经可用了很长时间。

There is also a Pub/Sub IO available as well as BigTable (write) .还有一个Pub/Sub IO和BigTable (write) 可用。 SpannerIO is being worked on as we speak. SpannerIO正在开发中。

This page has more detail https://beam.apache.org/documentation/io/built-in/此页面有更多详细信息https://beam.apache.org/documentation/io/built-in/

UPDATE:更新：

In line with OP giving more details, it turns out that indeed using value providers in the BigQuery query string was not supported.根据 OP 提供的更多详细信息，事实证明确实不支持在 BigQuery 查询字符串中使用值提供程序。

This has been remedied in the following PR: https://github.com/apache/beam/pull/11040 and will most likely be part of the 2.21.0 release.这已在以下 PR 中得到解决： https : //github.com/apache/beam/pull/11040，并且很可能成为 2.21.0 版本的一部分。

UPDATE 2: This new feature has been added in the 2.20.0 release of Apache Beam https://beam.apache.org/blog/2020/04/15/beam-2.20.0.html更新 2：此新功能已添加到 Apache Beam 的 2.20.0 版本https://beam.apache.org/blog/2020/04/15/beam-2.20.0.html

Hope it solves your problem!希望它能解决您的问题！