繁体 English 中英

Spark Aggregatebykey分区程序顺序

[英]Spark Aggregatebykey partitioner order

原文 2016-01-25 04:15:28 9 1 scala/ apache-spark/ rdd

如果我将哈希分区程序应用于Spark的aggregatebykey函数，即myRDD.aggregateByKey(0, new HashPartitioner(20))(combOp, mergeOp)

在使用combOp和mergeOp聚合键/值对之前，myRDD是否会先进行重新分区？ 还是myRDD首先通过combOp和mergeOp，然后使用HashPartitioner对生成的RDD进行重新分区？

1 个解决方案

aggregateByKey在最终洗牌之前应用地图端聚合。 由于每个分区都是按顺序处理的，因此在此阶段应用的唯一操作是初始化（创建zeroValue ）和combOp 。 mergeOp的目标是合并聚合缓冲区，因此在混洗之前不使用它。

如果输入RDD是ShuffledRDD与同分区的请求aggregateByKey则数据根本没有被洗牌和数据使用本地聚集mapPartitions 。

用元组触发aggregateByKey

[英]spark aggregateByKey with tuple

Spark：aggregateByKey成一对列表

[英]Spark: aggregateByKey into a pair of lists

AggregateByKey方法在Spark rdd中不起作用

[英]aggregateByKey method not working in spark rdd

Spark AggregateByKey从pySpark到Scala

[英]Spark AggregateByKey From pySpark to Scala

Spark-aggregateByKey 类型不匹配错误

[英]Spark - aggregateByKey Type mismatch error

Spark DataFrame分区程序为None

[英]Spark DataFrame partitioner is None

apache spark中的自定义分区程序

[英]custom partitioner in apache spark

为什么在 spark aggregateByKey 中从未调用组合器？

[英]Why is the combiner never called in spark aggregateByKey?

Spark：如何在单台机器上管理大型aggregatyByKey

[英]Spark: How to manage a big aggregateByKey on a single machine

DStream [Class] Spark Streaming的reduceByKey / aggregateByKey替代

[英]reduceByKey/aggregateByKey alternative for a DStream[Class] Spark Streaming

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用元组触发aggregateByKey Spark：aggregateByKey成一对列表 AggregateByKey方法在Spark rdd中不起作用 Spark AggregateByKey从pySpark到Scala Spark-aggregateByKey 类型不匹配错误 Spark DataFrame分区程序为None apache spark中的自定义分区程序为什么在 spark aggregateByKey 中从未调用组合器？ Spark：如何在单台机器上管理大型aggregatyByKey DStream [Class] Spark Streaming的reduceByKey / aggregateByKey替代

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM