Spark结构化流水印错误

Question

|  A    | B                                        |
|-------|------------------------------------------|
|  ABC  |  [{C:1, D:1}, {C:2, D:4}]                | 
|  XYZ  |  [{C:3, D :6}, {C:9, D:11}, {C:5, D:12}] |

Answer 1

As per my understanding, watermarking is required only when you are performing window operation on event time. 根据我的理解，只有在事件时间执行窗口操作时才需要加水印。 Spark used watermarking to handle late data and for the same purpose Spark needs to save older aggregation. Spark使用水印来处理后期数据，出于同样的目的，Spark需要保存较旧的聚合。

The following link explains this very well with example: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking 以下链接通过示例解释了这一点： https ： //spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking

I don't see any window operations in your transformation and if that is the case then I think you can try running the stream query without watermarking. 我没有在转换中看到任何窗口操作，如果是这种情况，那么我认为您可以尝试运行流查询而不加水印。

Answer 2

在对火花流结构进行分组时，您必须在数据帧中已经有了水印，并在分组时将其考虑在内，方法是在您的聚合中包含水印窗口。

    df.groupBy(col("dummy"), window(col("event_time"), "1 day")).

Spark结构化流水印错误

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-11-02 11:19:35

解决方案2
0 2022-01-08 08:36:52

Spark结构化流水印错误

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-11-02 11:19:35

解决方案2 0 2022-01-08 08:36:52

解决方案1
0 已采纳 2018-11-02 11:19:35

解决方案2
0 2022-01-08 08:36:52