[英]Dataflow Template/Pattern in enriching fixed BigQuery data by streaming Pubsub data
I have a BigQuery dimension table (which doesn't change much) and a streaming JSON data from PubSub.我有一个 BigQuery 维度表(变化不大)和来自 PubSub 的流式 JSON 数据。 What I want to do is to query this dimension table, and enrich the data by joining on the incoming data from PubSub, then write those streams of joined data to another BigQuery table.我想要做的是查询这个维度表,并通过加入来自 PubSub 的传入数据来丰富数据,然后将这些加入的数据流写入另一个 BigQuery 表。
As I am new to Dataflow/Beam and the concept is still not that clear to me (or at least I have difficulty starting to write the code), I have a number of questions:由于我是 Dataflow/Beam 的新手,并且这个概念对我来说仍然不是很清楚(或者至少我开始编写代码有困难),所以我有很多问题:
ParDo.of(...).withSideInputs(PCollectionView<Map<String, String>> map)
?像ParDo.of(...).withSideInputs(PCollectionView<Map<String, String>> map)
?You need to join two PCollections.您需要加入两个 PCollection。
PeriodicImpulse
and your own ParDo
to create a periodically changing input.一种选择可能是使用转换PeriodicImpulse
和您自己的ParDo
来创建周期性变化的输入。 See here for an example (please note that PeriodicImpulse
transform was added recently).有关示例,请参见此处(请注意,最近添加了PeriodicImpulse
变换)。 You can combine the data in a ParDo
where PCollection
(1) is the main input and PCollection
(2) is a side input (similar to the example above).您可以在ParDo
中组合数据,其中PCollection
(1) 是主要输入, PCollection
(2) 是辅助输入(类似于上面的示例)。
Finally you can stream output to BigQuery using the BigQueryIO.Write transform.最后,您可以使用BigQueryIO.Write转换将 stream output 转换为 BigQuery。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.