简体   繁体   中英

How do I make my Spark job run faster using executors?

I know my code is free from antipatterns since I don't have any warnings in my Authoring code editor, so I know my code is doing PySpark operations that are distributed and scalable.

My current job has 2 executors assigned to it with 2 cores each, and it runs with task parallelism of 16 as seen on the Spark Details page.

How do I make this job run faster?

Your Executors are the pieces of Spark infrastructure assigned to 'execute' your work. As such, the more of these 'workers' you have, the more work you are able to do in parallel and the faster your job will be.

There's a limit to the amount your job will increase in speed however, and this is a function of the max number of tasks in your stages.

For instance, if my data scale is such that I only ever have a maximum of 8 tasks (let's assume AQE is controlling this), assigning an executor count to run more than 8 tasks will waste resources and won't increase your job speed.

The job defaults in most Foundry environments are 2 executors with 2 cores each, and 1 core per task. This means your job is capable of running 4 cores at a time, which means 4 tasks.

This means if your max task counts per stage in your job is 4, you won't benefit from boosting your number of executors. If, however, you observe your stages have, for instance, 16 tasks, then you can choose to increase the number of executors in your job as such:

16 max tasks, 1 core per task. -> 16 cores needed.
2 cores per executor -> 8 executors max.

We could therefore jump this example job up to 8 executors for maximum performance.

For the original question, you would bump the number of executors to 8 for maximum performance. Any further bump of executor count would not be used and would waste resources.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM