[英]Apache Beam - adding a delay into a pipeline
I have a simple pipeline that reads from a Pub Sub topic and writes to BigQuery.我有一个从 Pub Sub 主题读取并写入 BigQuery 的简单管道。 I would like to introduce a 5 minute delay between reading the message from the topic and writing it to BQ.我想在从主题读取消息和将其写入 BQ 之间引入 5 分钟的延迟。
I thought I could do this using a trigger, similarly to this below, however the message still goes straight through with no delay.我想我可以使用触发器来做到这一点,类似于下面的这个,但是消息仍然直接通过,没有延迟。
PCollection<PubsubMessage> windowed_inputEvents =
inputEvents.apply(
Window.<PubsubMessage>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(5)))
.withAllowedLateness(Duration.standardMinutes(1))
.discardingFiredPanes());
Is it possible to create such a delay using triggers?是否可以使用触发器创建这样的延迟?
Thanks谢谢
It looks like you are mixing up couple of things.看起来您正在混淆几件事。 In your example you have a fixed window of 1 minute which means that at the end of the window all the data elements that are part of the window is emitted.在您的示例中,您有一个 1 分钟的固定窗口,这意味着在窗口结束时,所有属于该窗口的数据元素都会被发出。
Triggers are basically additional levers that you can leverage to emit data before a window is closed.触发器基本上是额外的杠杆,您可以利用它在窗口关闭之前发出数据。 Triggers cannot hold data post a window period is closed.触发器不能在窗口期关闭后保存数据。 For example if the window is between 12:00 and 12:01 and if the first element comes at 12:00 then at the time when the window is closed at 12:01 the element is emitted, it is not held back till 12:05.例如,如果窗口在 12:00 和 12:01 之间,并且如果第一个元素在 12:00 出现,那么在窗口在 12:01 关闭时该元素被发射,它不会被推迟到 12 点: 05.
To meet your requirements you can do couple of things:-为了满足您的要求,您可以做几件事:-
withTriggeringFrequency
.如果这在 BigqueryIO 中无法实现,则可以使用 FILE_LOADS 方法将数据批量写入 Bigquery,并且此 API 也可以使用withTriggeringFrequency
支持持续时间。 More details can be found here - https://beam.apache.org/releases/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTriggeringFrequency-org.joda.time.Duration-更多细节可以在这里找到 - https://beam.apache.org/releases/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTriggeringFrequency-org.joda .time.Duration-
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.