we have two Streams S1 and S2 of events that have the same keys (userId). Is it possible to apply a session Window across both collections so that an occurrence of Key X in either stream would contribute to the session? Would this create Windows across PCollections that would let us join these afterwards?
For Context:
Many Thanks!
This is correct - you can do this because windows come into play when you perform grouping operations. This means that you can do something simple like this:
p = beam.Pipeline(...)
# Assume that timestamp information is already in the streams
first_stream = p | ReadMyFirstStream() | beam.WindowInto(window.Sessions(...))
second_stream = p | ReadMySecondStream() | beam.WindowInto(window.Sessions(...))
joined_streams = (
{'first': first_stream,
'second': second_stream}
| beam.CoGroupByKey())
The joined streams PCollection will generate windows where elements from both streams are grouped together.
This will work in Java as well. I answered using Python for the sake of simplicity. Let me know if you'd prefer Java code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.