简体   繁体   中英

Handle Too Late data in Spark Streaming

Watermark allows late arriving data to be considered for inclusion against already computed results for a period of time using windows. Its premise is that it tracks to a point in time before which it is assumed no more late events are supposed to arrive, but if they do, they are none-the-less discarded .

Is there a way to store the discarded data, that can be used for reconciliation purpose later? Say In my Structured Streaming, I set the watermark to 1 hour. I am doing window operation for each 10 min and received a later event 20 min late. Is there a way I can store the discarded data say at a different location rather than discarding it?

不,没有办法实现这一方面。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM