繁体 English 中英

SPARK Partitions 和 Worker Cores 有什么区别？

[英]What is the difference between SPARK Partitions and Worker Cores?

原文 2016-11-21 20:45:53 5 2 java/ hadoop/ apache-spark

我使用Standalone Spark Cluster来处理几个文件。 当我执行驱动程序时，数据是在每个工人上使用它的核心处理的。

现在，我已经阅读了Partitions ，但我不明白它是否与 Worker Cores 不同。

设置cores number和partition numbers有区别吗？

2 个解决方案

简单视图：分区与核心数

当你调用一个 RDD 的动作时，

为它创建了一个“作业”。 所以，Job 是一个提交给 spark 的工作。
工作分为基于 n shuffle 边界的“STAGE”！！！
每个阶段根据 RDD 上的分区数量进一步划分为任务。 所以Task是spark的最小工作单元。
现在，可以同时执行多少个这些任务取决于可用的“核心数”！！！

分区（或任务）是指一个工作单元。 如果你有一个 200G 的 hadoop 文件作为 RDD 加载并按 128M 分块（Spark 默认），那么你在这个 RDD 中有大约 2000 个分区。 核心数决定了一次可以处理多少个分区，最多 2000 个（以分区/任务数为上限）可以并行执行此 RDD。

WorkManager 中的 Worker 和 ListenableWorker 有什么区别？

[英]What is the difference between Worker and ListenableWorker in WorkManager?

Spark序列化和Java序列化有什么区别？

[英]What is the difference between Spark Serialization and Java Serialization?

spark的spark调度模式和应用程序队列之间有什么区别？

[英]what is the difference between spark scheduling mode and application queue in spark?

Spark 独立模式：有没有办法以编程方式从 Spark 的 localhost:8080 获取每个工作人员的内核/内存信息

[英]Spark Standalone Mode: Is there a way to programmatically get cores/memory information for each worker from Spark's localhost:8080

在 spark 中设置 conf spark.default.parallelism 和调用方法 rdd.coalesce() 有什么区别？

[英]In spark what is the difference between setting the conf spark.default.parallelism and calling the method rdd.coalesce()?

使用spark-submit和java -cp运行spark应用程序时有什么区别？

[英]What is the difference between when I run a spark application using spark-submit and java -cp?

Swing Worker和普通线程之间的区别？

[英]Difference between Swing Worker and normal Threads?

标准verticle和workerverticle的区别

[英]Difference between standard verticle and worker verticle

Apache Spark 中的执行器和内核

[英]Executors and cores in Apache Spark

合并时的 Spark 分区大小

[英]Spark partitions size on coalesce

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 WorkManager 中的 Worker 和 ListenableWorker 有什么区别？ Spark序列化和Java序列化有什么区别？ spark的spark调度模式和应用程序队列之间有什么区别？ Spark 独立模式：有没有办法以编程方式从 Spark 的 localhost:8080 获取每个工作人员的内核/内存信息在 spark 中设置 conf spark.default.parallelism 和调用方法 rdd.coalesce() 有什么区别？使用spark-submit和java -cp运行spark应用程序时有什么区别？ Swing Worker和普通线程之间的区别？标准verticle和workerverticle的区别 Apache Spark 中的执行器和内核合并时的 Spark 分区大小

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM