简体   繁体   中英

How do I identify that my Foundry job's stage has skew?

I have a job running with a stage that seems to be taking a long time. I've heard that this might be due to something called 'skew'.

How do I know if I'm being impacted by this?

I know this is commonly associated with joins, windows, and other operations that incur shuffles but I don't know how to identify it.

  1. Open Spark Details like this
  2. Identify either currently running stage or your slowest stage overall
  3. Click on this stage's row to reveal Stage Details button阶段详细信息按钮
  4. Click on Stage Details button
  5. Look at metrics of stage at the top of your screen. If you see a smaller number of tasks running significantly longer than the others, this means you have skew倾斜
  6. If you click on the slowest task, you'll find the task highlighted in the overview below, which will indicate the sizes of inputs / outputs. 歪斜细节

In the above example, there is a task in this job + stage that is taking orders of magnitude longer to run because its input size is orders of magnitude larger than the other tasks.

This is the definition of a skewed task / skewed stage.

If you want to know what value is causing this task to be slow, check out the guidance over here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM