简体繁体 English

ADF 翻滚 Window 触发依赖

[英]ADF Tumbling Window Trigger Dependency

原文 2023-01-04 09:37:48 9 1 azure/ azure-data-factory/ azure-data-factory-2

I went over the link but I could not understand what is the diagram trying to convey.我浏览了链接，但无法理解图表试图传达的内容。 The diagram is not conclusive enough.该图还不够确定。 There is no explanation.没有解释。

Consider the following diagram from the doc:考虑文档中的下图：

The Dependency Offset part seems clear. Dependency Offset部分看起来很清楚。 The second diagram in Dependency Offset tells me that Trigger A's 10 o clock run should depend on Trigger B's 9 o clock run. Dependency Offset 中的第二张图告诉我触发器 A 的 10 点钟运行应该取决于触发器 B 的 9 点钟运行。

The Dependency Size part does not quite make sense. Dependency Size部分没有多大意义。 How to make sense out of it/interpret it?如何理解/解释它？ eg the second example in the Dependency Size section where it says Offset is -1 hr and Size is 2 hrs.例如，“依赖大小”部分中的第二个示例显示“偏移量”为 -1 小时，“大小”为 2 小时。 What is the size referring to here?这里的尺寸指的是什么？ The size 2hrs defined there is causing to have an overlap of 1 hr (ie between 10 and 11) with the complete window of Trigger A which effectively leaves Trigger A with no time to execute.那里定义的大小 2hrs 导致与触发器 A 的完整 window 有 1 小时的重叠（即在 10 到 11 之间），这实际上使触发器 A 没有时间执行。 I am sure I am interpreting it wrong.我确定我解释错了。 Could someone please help?有人可以帮忙吗？

. .

1 个解决方案

Imagine you have a job that runs every hour to produce "usage metrics" for the past hour.想象一下，您有一个每小时运行一次的作业，用于生成过去一小时的“使用指标”。 Imagine "Trigger B" runs this job hourly.想象一下“触发器 B”每小时运行一次该作业。

Now you want a job (Trigger A) that creates a "rolling average" of usage metrics over the past two hours.现在您需要一个作业（触发器 A）来创建过去两个小时内使用指标的“滚动平均值”。 To reduce computational requirements, you can produce this by taking the average of the usage metrics produced for the last two hours (calculations output by Trigger B).为了减少计算需求，您可以通过取最近两个小时生成的使用指标的平均值来生成此数据（触发器 B 的计算结果为 output）。

You still run this "rolling average" ("Trigger A") job every hour, but it relies not only on the most recent "Trigger B" run that ran on this hour, but the previous one as well.您仍每小时运行此“滚动平均”（“触发器 A”）作业，但它不仅依赖于这一小时运行的最近一次“触发器 B”运行，还依赖于前一个运行。 The "Dependency Size" of 2 hours (with offset of -1 hour) means this trigger A will not run unless the 2 "Trigger B" jobs (from this hour and last hour) have both completed successfully. 2 小时的“依赖大小”（偏移量为 -1 小时）意味着此触发器 A 将不会运行，除非 2 个“触发器 B”作业（从这一小时和最后一小时）都已成功完成。

Agreed that the diagram conveys the concept poorly for a number of reasons.同意该图出于多种原因未能很好地传达概念。 For example in my opinion, Trigger B should be named A and vice-versa to make the dependency order more intuitively obvious.例如，在我看来，Trigger B 应该命名为 A，反之亦然，以使依赖顺序更直观明显。

Also, the individual/multiple pipeline runs should be visualized within the windows. Something like this might have been better:此外，单个/多个管道运行应该在 windows 中可视化。这样的事情可能会更好：