简体   繁体   English

预计 ETA 在使用 python 的 apache beam GCP 数据流管道中使用管道 I/O 和运行时参数?

[英]Expected ETA to avail Pipeline I/O and runtime parameters in apache beam GCP dataflow pipeline using python?

Just wanted to know do we have more pipeline I/O and runtime parameters available with new version (3.X) of python.只是想知道我们是否有更多的管道 I/O 和运行时参数可用于新版本 (3.X) 的 python。 If I am correct then currently apache beam provide only File-based IOs: textio, avroio, tfrecordio when using python.如果我是正确的,那么当前 apache beam 在使用 python 时只提供基于文件的 IO:textio、avroio、tfrecordio。 But with Java we have more options available like File-based IOs, BigQueryIO, BigtableIO, PubSubIO and SpannerIO.但是对于 Java,我们有更多可用选项,例如基于文件的 IO、BigQueryIO、BigtableIO、PubSubIO 和 SpannerIO。

In my requirement I want to use BigQueryIO in GCP dataflow pipeline using python 3.X, But currently it is not available.在我的要求中,我想在使用 python 3.X 的 GCP 数据流管道中使用 BigQueryIO,但目前它不可用。 Does anyone have some update on ETA when will it be available by apache beam.有没有人有关于 ETA 的一些更新,什么时候可以通过 apache beam 获得。

The BigTable Connector for Python 3 is under development for some time now. Python 3 的 BigTable 连接器已经开发了一段时间。 Currently, there is no ETA but you can follow the relevant Pull-Request from the official Apache Beam repository for further updates.目前,没有 ETA,但您可以关注官方 Apache Beam 存储库中的相关Pull-Request以获取进一步更新。

BigQueryIO has been available for quite some time in the Apache Beam Python SDK. BigQueryIO在 Apache Beam Python SDK 中已经可用了很长时间。

There is also a Pub/Sub IO available as well as BigTable (write) .还有一个Pub/Sub IOBigTable (write) 可用 SpannerIO is being worked on as we speak. SpannerIO正在开发中。

This page has more detail https://beam.apache.org/documentation/io/built-in/此页面有更多详细信息https://beam.apache.org/documentation/io/built-in/

UPDATE:更新:

In line with OP giving more details, it turns out that indeed using value providers in the BigQuery query string was not supported.根据 OP 提供的更多详细信息,事实证明确实不支持在 BigQuery 查询字符串中使用值提供程序。

This has been remedied in the following PR: https://github.com/apache/beam/pull/11040 and will most likely be part of the 2.21.0 release.这已在以下 PR 中得到解决: https : //github.com/apache/beam/pull/11040,并且很可能成为 2.21.0 版本的一部分。

UPDATE 2: This new feature has been added in the 2.20.0 release of Apache Beam https://beam.apache.org/blog/2020/04/15/beam-2.20.0.html更新 2:此新功能已添加到 Apache Beam 的 2.20.0 版本https://beam.apache.org/blog/2020/04/15/beam-2.20.0.html

Hope it solves your problem!希望它能解决您的问题!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 apache 光束数据流管道(python)中的步骤的 If 语句 - If statement for steps in a apache beam dataflow pipeline (python) 如何将 numpy 导入在 GCP Dataflow 上运行的 Apache Beam 管道? - How do I import numpy into an Apache Beam pipeline, running on GCP Dataflow? 在 GCP 数据流上使用 python apache 光束中的 scipy - Using scipy in python apache beam on GCP Dataflow Apache 光束侧输入在流数据流管道中不工作 Python SDK - Apache Beam Side Inputs not working in Streaming Dataflow Pipeline with Python SDK 如何在 Python 中使用 apache beam Pipeline 处理异常? - How can I handle an exception using apache beam Pipeline in Python? Apache Beam 数据流管道使用 Bazel 构建和部署 - Apache Beam Dataflow pipeline build and deploy with Bazel 使用 Apache Beam(GCP 数据流)写入 Kafka - Write To Kafka using Apache Beam (GCP Dataflow) 在嵌入式 Flinkrunner (apache_beam [GCP]) 中使用 pub/sub io 运行光束流管道 (Python) 时出错 - Error while running beam streaming pipeline (Python) with pub/sub io in embedded Flinkrunner (apache_beam [GCP]) 如何使用GCP Dataflow中的python管道代码读取BigQuery表 - How to read BigQuery table using python pipeline code in GCP Dataflow GCP Dataflow Apache Beam 代码逻辑未按预期工作 - GCP Dataflow Apache Beam code logic not working as expected
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM