[英]Apache Beam Session Windowing and joining across PCollections
we have two Streams S1 and S2 of events that have the same keys (userId).我们有两个具有相同键(userId)的事件流 S1 和 S2。 Is it possible to apply a session Window across both collections so that an occurrence of Key X in either stream would contribute to the session?
Is it possible to apply a session Window across both collections so that an occurrence of Key X in either stream would contribute to the session? Would this create Windows across PCollections that would let us join these afterwards?
这会在 PCollections 中创建 Windows 让我们之后加入这些吗?
For Context:对于上下文:
Many Thanks!非常感谢!
This is correct - you can do this because windows come into play when you perform grouping operations.这是正确的 - 您可以这样做,因为 windows 在您执行分组操作时发挥作用。 This means that you can do something simple like this:
这意味着您可以执行以下简单操作:
p = beam.Pipeline(...)
# Assume that timestamp information is already in the streams
first_stream = p | ReadMyFirstStream() | beam.WindowInto(window.Sessions(...))
second_stream = p | ReadMySecondStream() | beam.WindowInto(window.Sessions(...))
joined_streams = (
{'first': first_stream,
'second': second_stream}
| beam.CoGroupByKey())
The joined streams PCollection will generate windows where elements from both streams are grouped together.连接的流 PCollection 将生成 windows ,其中来自两个流的元素被组合在一起。
This will work in Java as well.这也适用于 Java。 I answered using Python for the sake of simplicity.
为了简单起见,我使用 Python 回答。 Let me know if you'd prefer Java code.
如果您更喜欢 Java 代码,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.