簡體 English 中英

Spark Aggregatebykey分區程序順序

[英]Spark Aggregatebykey partitioner order

原文 2016-01-25 04:15:28 3 1 scala/ apache-spark/ rdd

如果我將哈希分區程序應用於Spark的aggregatebykey函數，即myRDD.aggregateByKey(0, new HashPartitioner(20))(combOp, mergeOp)

在使用combOp和mergeOp聚合鍵/值對之前，myRDD是否會先進行重新分區？ 還是myRDD首先通過combOp和mergeOp，然后使用HashPartitioner對生成的RDD進行重新分區？

1 個解決方案

aggregateByKey在最終洗牌之前應用地圖端聚合。 由於每個分區都是按順序處理的，因此在此階段應用的唯一操作是初始化（創建zeroValue ）和combOp 。 mergeOp的目標是合並聚合緩沖區，因此在混洗之前不使用它。

如果輸入RDD是ShuffledRDD與同分區的請求aggregateByKey則數據根本沒有被洗牌和數據使用本地聚集mapPartitions 。

用元組觸發aggregateByKey

[英]spark aggregateByKey with tuple

Spark：aggregateByKey成一對列表

[英]Spark: aggregateByKey into a pair of lists

AggregateByKey方法在Spark rdd中不起作用

[英]aggregateByKey method not working in spark rdd

Spark AggregateByKey從pySpark到Scala

[英]Spark AggregateByKey From pySpark to Scala

Spark-aggregateByKey 類型不匹配錯誤

[英]Spark - aggregateByKey Type mismatch error

Spark DataFrame分區程序為None

[英]Spark DataFrame partitioner is None

apache spark中的自定義分區程序

[英]custom partitioner in apache spark

為什么在 spark aggregateByKey 中從未調用組合器？

[英]Why is the combiner never called in spark aggregateByKey?

Spark：如何在單台機器上管理大型aggregatyByKey

[英]Spark: How to manage a big aggregateByKey on a single machine

DStream [Class] Spark Streaming的reduceByKey / aggregateByKey替代

[英]reduceByKey/aggregateByKey alternative for a DStream[Class] Spark Streaming

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 用元組觸發aggregateByKey Spark：aggregateByKey成一對列表 AggregateByKey方法在Spark rdd中不起作用 Spark AggregateByKey從pySpark到Scala Spark-aggregateByKey 類型不匹配錯誤 Spark DataFrame分區程序為None apache spark中的自定義分區程序為什么在 spark aggregateByKey 中從未調用組合器？ Spark：如何在單台機器上管理大型aggregatyByKey DStream [Class] Spark Streaming的reduceByKey / aggregateByKey替代

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM