简体   繁体   English

我如何确定我的 Foundry 工作的阶段有偏差?

[英]How do I identify that my Foundry job's stage has skew?

I have a job running with a stage that seems to be taking a long time.我有一份工作在一个似乎需要很长时间的舞台上运行。 I've heard that this might be due to something called 'skew'.我听说这可能是由于一种叫做“偏斜”的东西。

How do I know if I'm being impacted by this?我怎么知道我是否受到此影响?

I know this is commonly associated with joins, windows, and other operations that incur shuffles but I don't know how to identify it.我知道这通常与连接、windows 和其他导致随机播放的操作相关联,但我不知道如何识别它。

  1. Open Spark Details like this这样打开 Spark 详细信息
  2. Identify either currently running stage or your slowest stage overall确定当前运行的阶段或总体上最慢的阶段
  3. Click on this stage's row to reveal Stage Details button单击此阶段的行以显示阶段详细信息按钮阶段详细信息按钮
  4. Click on Stage Details button单击阶段详细信息按钮
  5. Look at metrics of stage at the top of your screen.查看屏幕顶部的舞台指标。 If you see a smaller number of tasks running significantly longer than the others, this means you have skew如果您看到少数任务运行时间明显长于其他任务,这意味着您有偏斜倾斜
  6. If you click on the slowest task, you'll find the task highlighted in the overview below, which will indicate the sizes of inputs / outputs.如果您单击最慢的任务,您会发现该任务在下面的概述中突出显示,这将指示输入/输出的大小。 歪斜细节

In the above example, there is a task in this job + stage that is taking orders of magnitude longer to run because its input size is orders of magnitude larger than the other tasks.在上面的示例中,此作业 + 阶段中有一个任务的运行时间要长几个数量级,因为它的输入大小比其他任务大几个数量级。

This is the definition of a skewed task / skewed stage.这是倾斜任务/倾斜阶段的定义。

If you want to know what value is causing this task to be slow, check out the guidance over here如果您想知道什么值导致此任务变慢,请查看此处的指南

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM