I have a job running with a stage that seems to be taking a long time. I've heard that this might be due to something called 'skew'.
How do I know if I'm being impacted by this?
I know this is commonly associated with joins, windows, and other operations that incur shuffles but I don't know how to identify it.
In the above example, there is a task in this job + stage that is taking orders of magnitude longer to run because its input size is orders of magnitude larger than the other tasks.
This is the definition of a skewed task / skewed stage.
If you want to know what value is causing this task to be slow, check out the guidance over here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.