简体   繁体   English

Apache Beam Session 跨 PCollection 开窗和连接

[英]Apache Beam Session Windowing and joining across PCollections

we have two Streams S1 and S2 of events that have the same keys (userId).我们有两个具有相同键(userId)的事件流 S1 和 S2。 Is it possible to apply a session Window across both collections so that an occurrence of Key X in either stream would contribute to the session? Is it possible to apply a session Window across both collections so that an occurrence of Key X in either stream would contribute to the session? Would this create Windows across PCollections that would let us join these afterwards?这会在 PCollections 中创建 Windows 让我们之后加入这些吗?

For Context:对于上下文:

  • We are using the DataFlowRunner我们正在使用 DataFlowRunner
  • both S1 and S2 are unbounded collections from PubSubIO S1 和 S2 都是来自 PubSubIO 的无界 collections

Many Thanks!非常感谢!

This is correct - you can do this because windows come into play when you perform grouping operations.这是正确的 - 您可以这样做,因为 windows 在您执行分组操作时发挥作用。 This means that you can do something simple like this:这意味着您可以执行以下简单操作:

p = beam.Pipeline(...)

# Assume that timestamp information is already in the streams
first_stream = p | ReadMyFirstStream() | beam.WindowInto(window.Sessions(...))
second_stream = p | ReadMySecondStream() | beam.WindowInto(window.Sessions(...))

joined_streams = (
    {'first': first_stream,
     'second': second_stream}
    | beam.CoGroupByKey())

The joined streams PCollection will generate windows where elements from both streams are grouped together.连接的流 PCollection 将生成 windows ,其中来自两个流的元素被组合在一起。


This will work in Java as well.这也适用于 Java。 I answered using Python for the sake of simplicity.为了简单起见,我使用 Python 回答。 Let me know if you'd prefer Java code.如果您更喜欢 Java 代码,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 连接两个数据流的正确 Apache 波束窗口策略 - Correct Apache beam windowing strategy for joining two streams of data 如何修复“Apache Beam中仅加入具有触发器的非全局窗口”的“加入无界PCollections” - How to fix “Joining unbounded PCollections is currently only supported for non-global windows with triggers” in Apache Beam 在Apache Beam中使用BigQuery处理空的PCollections - Handling empty PCollections with BigQuery in Apache Beam Apache Beam - 在两个无界 PCollections 上按键进行流连接 - Apache Beam - Stream Join by Key on two unbounded PCollections Java Apache Beam PCollections 以及如何使它们工作? - Java Apache Beam PCollections and how to make them work? 在Apache Beam中联接行 - Joining rows in Apache Beam Apache Beam 中的多个 output PCollections 故障发射元件 - Trouble emitting elements to multiple output PCollections in Apache Beam DataFlow (Apache Beam) 中发布/订阅的自定义时间戳和窗口 - Custom timestamp and windowing for Pub/Sub in DataFlow (Apache Beam) 使用Apache Beam进行窗口化 - 固定Windows似乎不会关闭? - Windowing with Apache Beam - Fixed Windows Don't Seem to be Closing? 使用 Apache Beam 的 Fixed Windowing 仅触发一次元素 - Trigger elements exactly once using Fixed Windowing with Apache Beam
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM