简体   繁体   中英

What's the difference between Processing Time and Event Time in Apache Beam

According to Apache Beam Documentation

“event time” determined by the timestamp on the data element itself

“processing time”, determined by the clock on the system processing the element

My Data is a json file, none of my fields is a timestamp. What's my event time in this case?

I'm ingesting data via Pub/Sub and processing data with Cloud Dataflow

In this case the "event time" is when the event is published to the topic. So for example if your dataflow can't process the published events in publishing frequency, then the event time will lag behind, so your system latency will increase in your dataflow.

The understanding of these 2 notions is paramount in case of using Beam windows. The difference between the Event time (generation of the event published in the PubSub topic) and the real processing by the dataflow in streaming mode is the lag .

This lag is observed by Dataflow, and you can print Stackdriver metric of this. It's computed by Dataflow and it's named Watermark . It's kind of a lag average.

When you define windows, you can set up trigger according to this Watermark, and data that arrives later. The observation windows themselves can be closed according to this watermark. Not really intuitive at the beginning, but really helpful and powerful!

You can find more details in the Beam Programming Guide

Event time is the time event has actually occurred. Event time has to be derived from the field in event, example : timestamp field.

Processing time is the time when the event is processed.

In your case, you can't extract event time.

In Event time the data is processed based on the timestamp at the source of each record. It is essentially the time in which event is created.

Process time is the time of receival of data at the streaming application. This is also the time where the data is processed at the streaming service.

It can be understood from a streaming perspective as Data Creating time: time_stamp and data Arrival time: time_stamp.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM