[英]What is the difference between SPARK Partitions and Worker Cores?
I used the Standalone Spark Cluster
to process several files.我使用
Standalone Spark Cluster
来处理几个文件。 When I executed the Driver, the data was processed on each worker using it's cores.当我执行驱动程序时,数据是在每个工人上使用它的核心处理的。
Now, I've read about Partitions
, but I didn't get it if it's different than Worker Cores or not.现在,我已经阅读了
Partitions
,但我不明白它是否与 Worker Cores 不同。
Is there a difference between setting cores number
and partition numbers
?设置
cores number
和partition numbers
有区别吗?
Simplistic view: Partition vs Number of Cores简单视图:分区与核心数
When you invoke an action an RDD,当你调用一个 RDD 的动作时,
Partition (or task) refers to a unit of work.分区(或任务)是指一个工作单元。 If you have a 200G hadoop file loaded as an RDD and chunked by 128M (Spark default), then you have ~2000 partitions in this RDD.
如果你有一个 200G 的 hadoop 文件作为 RDD 加载并按 128M 分块(Spark 默认),那么你在这个 RDD 中有大约 2000 个分区。 The number of cores determines how many partitions can be processed at any one time, and up to 2000 (capped at the number of partitions/tasks) can execute this RDD in parallel.
核心数决定了一次可以处理多少个分区,最多 2000 个(以分区/任务数为上限)可以并行执行此 RDD。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.