简体   繁体   English

Spark执行器和任务并发

[英]Spark executor & tasks concurrency

In Spark, an executor may run many tasks concurrently maybe 2 or 5 or 6 . 在Spark中,执行者可以同时运行许多任务,可能是2或5或6。

How Spark figures out (or calculate) the number of tasks to be run in the same executor concurrently ie how many tasks can run in an executor concurrently? Spark如何计算(或计算)在同一执行程序中要同时执行的任务数,即在一个执行程序中可以同时执行多少个任务?

An executor may be executing one task but one more task maybe be placed to run concurrently on same executor? 一个执行程序可能正在执行一个任务,但是又可能要放置一个任务以在同一执行程序上同时运行? What's the criteria for that? 这样做的标准是什么?

An executor has fixed number of cores & memory. 执行程序具有固定数量的内核和内存。 As we do not specify memory & cores requirements for task in Spark, how to calculate how many can run concurrently in an executor? 由于我们未在Spark中为任务指定内存和内核要求,因此如何计算执行程序中可以同时运行多少个?

The number of tasks run parallely within an executor = number of cores configured. 执行程序中并行运行的任务数=配置的内核数。 You can always change this number through configuration. 您始终可以通过配置更改此数字。 The total number of tasks run by executor overall ( parallel or sequential) depends upon the total number of tasks created ( through number of splits) and through number of executors. 执行程序总体上(并行或顺序)执行的任务总数取决于创建的任务总数(通过拆分数)和执行程序数目。

All tasks running in one executor share the same memory configured. 在一个执行程序中运行的所有任务共享配置的相同内存。 Inside, it just launches as many threads as number of cores. 在内部,它只启动与内核数量一样多的线程。

One most probable issue could be the skewed partitions in the RDD you are processing. 一个最可能的问题可能是您正在处理的RDD中的分区偏斜。 If 2-6 partitions are having a lot of data on them, then in order to reduce data shuffle over the network, Spark will try that the executors process the data residing locally on their own nodes. 如果2-6个分区上有大量数据,则为了减少网络上的数据混洗,Spark将尝试让执行者处理本地驻留在其自己节点上的数据。 So you'll see those 2-6 executors working for a long time and the others would be done with there data in few milliseconds. 因此,您将看到2-6个执行器工作了很长时间,而其他执行器将在几毫秒内完成数据的处理。

You can find more about this in this stackoverflow question . 您可以在这个stackoverflow问题中找到更多有关此的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM