简体   繁体   English

Apache Beam Python SDK 中是否有等效的 withFormatFunction?

[英]Is there withFormatFunction equivalent in Apache Beam Python SDK?

I'm passing a PCollection of dictionary to WriteToBigQuery class. However, some fields of the dictionary aren't meant to be written to BigQuery tables.我正在将字典的 PCollection 传递给WriteToBigQuery class。但是,字典的某些字段并不意味着要写入 BigQuery 表。 They're important to decide the table name for the element (in streaming mode).它们对于决定元素的表名很重要(在流模式下)。 This is done by passing callable in the table parameter.这是通过在table参数中传递 callable 来完成的。 Is this possible to do in Beam Python?这可以在 Beam Python 中完成吗? This is possible in Java SDK through withFormatFunction of BigQueryIO.这可以通过 BigQueryIO 的withFormatFunction在 Java SDK 中实现。 Cheers.干杯。

There's not currently an equivalent of withFormatFunction in Beam Python. If you have a fixed set of output tables, you can have separate WriteToBigQuery transforms for each one, and branch earlier in the pipeline. Beam Python 中目前没有与 withFormatFunction 等效的方法。如果您有一组固定的 output 表,则可以为每个表分别进行 WriteToBigQuery 转换,并在管道中更早地进行分支。 You could potentially also make a PCollection of objects of a type that acts like a dict (containing the payload), but also has fields on it that the table name callable can read您还可以创建一个 PCollection 类型的对象,其行为类似于 dict(包含有效负载),但也有可调用表名可以读取的字段

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 谷歌云中的日志记录信息/调试消息 apache 光束 python sdk - logging info/debug messages in google cloud apache beam python sdk Apache Beam Python SDK:如何访问元素的时间戳? - Apache Beam Python SDK: How to access timestamp of an element? sdk_container_image apache/beam_python3.9_sdk:2.40.0 的数据流错误 - dataflow error with sdk_container_image apache/beam_python3.9_sdk:2.40.0 Apache Beam Java SDK 和 PubSub 源示例 - Example with Apache Beam Java SDK and PubSub Source 在 GCP Dataflow/Apache Beam Python SDK 中,DoFn.process 是否有时间限制? - In GCP Dataflow/Apache Beam Python SDK, is there a time limit for DoFn.process? 使用 GCP 数据流和 Apache Beam Python SDK 从 GCS 读取速度非常慢 - Incredibly slow read from GCS with GCP Dataflow & Apache Beam Python SDK CombineFn for Python dict in Apache Beam pipeline - CombineFn for Python dict in Apache Beam pipeline 在 apache-beam 中使用 python ReadFromKafka 不支持的信号:2 - ReadFromKafka with python in apache-beam Unsupported signal: 2 Apache Beam Pipeline 使用 DirectRunner 运行,但在初始读取步骤期间使用 DataflowRunner 失败(SDK harness sdk-0-0 断开连接) - Apache Beam Pipeline runs with DirectRunner, but fails with DataflowRunner (SDK harness sdk-0-0 disconnected) during initial read step 从字典行定义 Python Apache Beam 模式 - Defining a Python Apache Beam schema from dictionary rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM