简体繁体中英

What is the difference between SPARK Partitions and Worker Cores?

原文 2016-11-21 20:45:53 4 2 java/ hadoop/ apache-spark

I used the Standalone Spark Cluster to process several files. When I executed the Driver, the data was processed on each worker using it's cores.

Now, I've read about Partitions , but I didn't get it if it's different than Worker Cores or not.

Is there a difference between setting cores number and partition numbers ?

2 answers

Simplistic view: Partition vs Number of Cores

When you invoke an action an RDD,

A "Job" is created for it. So, Job is a work submitted to spark.
Jobs are divided in to "STAGE" based n the shuffle boundary!!!
Each stage is further divided to tasks based on the number of partitions on the RDD. So Task is smallest unit of work for spark.
Now, how many of these tasks can be executed simultaneously depends on the "Number of Cores" available!!!

Partition (or task) refers to a unit of work. If you have a 200G hadoop file loaded as an RDD and chunked by 128M (Spark default), then you have ~2000 partitions in this RDD. The number of cores determines how many partitions can be processed at any one time, and up to 2000 (capped at the number of partitions/tasks) can execute this RDD in parallel.

What is the difference between Worker and ListenableWorker in WorkManager?

What is the difference between Spark Serialization and Java Serialization?

what is the difference between spark scheduling mode and application queue in spark?

Spark Standalone Mode: Is there a way to programmatically get cores/memory information for each worker from Spark's localhost:8080

In spark what is the difference between setting the conf spark.default.parallelism and calling the method rdd.coalesce()?

What is the difference between when I run a spark application using spark-submit and java -cp?

Difference between Swing Worker and normal Threads?

Difference between standard verticle and worker verticle

Executors and cores in Apache Spark

Spark partitions size on coalesce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the difference between Worker and ListenableWorker in WorkManager? What is the difference between Spark Serialization and Java Serialization? what is the difference between spark scheduling mode and application queue in spark? Spark Standalone Mode: Is there a way to programmatically get cores/memory information for each worker from Spark's localhost:8080 In spark what is the difference between setting the conf spark.default.parallelism and calling the method rdd.coalesce()? What is the difference between when I run a spark application using spark-submit and java -cp? Difference between Swing Worker and normal Threads? Difference between standard verticle and worker verticle Executors and cores in Apache Spark Spark partitions size on coalesce

Related Tags

What is the difference between SPARK Partitions and Worker Cores?

Question

2 answers

solution1
10 ACCPTED 2016-11-22 06:04:40

solution2
3 2016-11-21 23:22:35

What is the difference between SPARK Partitions and Worker Cores?

Question

2 answers

solution1 10 ACCPTED 2016-11-22 06:04:40

solution2 3 2016-11-21 23:22:35

solution1
10 ACCPTED 2016-11-22 06:04:40

solution2
3 2016-11-21 23:22:35