簡體 English 中英

Apache Spark中reduce和reduceByKey的區別

[英]Difference between reduce and reduceByKey in Apache Spark

原文 2017-12-22 01:48:27 7 3 apache-spark

Apache Spark 中的 reduce 和 reduceByKey 在功能方面有什么區別？ 為什么reduceByKey 是一個轉換而reduce 是一個動作？

3 個解決方案

這與我解釋 reduceByKey 的答案很接近，但我將詳細說明使兩者不同的特定部分。 但是，請參閱我的回答，了解有關reduceByKey內部結構的更多reduceByKey 。

基本上， reduce必須將整個數據集拉到一個位置，因為它正在減少到一個最終值。 另一方面， reduceByKey是每個鍵的一個值。 由於此操作可以首先在本地每台機器上運行，因此它可以保留為 RDD 並對其數據集進行進一步的轉換。

但是請注意，您還可以使用reduceByKeyLocally自動將 Map 下拉到單個位置。

請瀏覽此官方文檔鏈接。

reduce是一種使用函數 func 聚合數據集元素的操作（它接受兩個參數並返回一個），我們也可以將 reduce 用於單個 RDD（有關更多信息，請單擊此處）。

reduceByKey在 (K, V) 對的數據集上調用時，返回 (K, V) 對的數據集，其中每個鍵的值使用給定的 reduce 函數 func 聚合，該函數必須是 (V,V) = 類型> V.（更多信息請點擊這里）

這是qt助手：

reduce(f)：使用指定的可交換和結合二元運算符減少此 RDD 的元素。 目前在本地減少分區。

reduceByKey(func, numPartitions=None, partitionFunc=) ：使用關聯和可交換的歸約函數合並每個鍵的值。

Spark：reduce和reduceByKey之間的語義差異

[英]Spark: difference of semantics between reduce and reduceByKey

Apache Spark - reducebyKey - Java -

[英]Apache Spark - reducebyKey - Java -

Spark減少了reduceByKey中的一些鍵

[英]Spark reduce by some keys in reduceByKey

Spark 或 Flink 中的 reduce、reduceByKey、reduceGroups

[英]reduce, reduceByKey, reduceGroups in Spark or Flink

reduceByKey、groupByKey、aggregateByKey 和 combineByKey 之間的火花差異

[英]Spark difference between reduceByKey vs groupByKey vs aggregateByKey vs combineByKey

使用.reduceByKey（）的Apache Spark NoSuchMethodError

[英]Apache Spark NoSuchMethodError using .reduceByKey()

Apache Spark在reduceByKey步驟上緩慢

[英]Apache Spark slow on reduceByKey step

在Apache Spark（Scala）中使用reduceByKey

[英]Using reduceByKey in Apache Spark (Scala)

Apache Spark reduceByKey 求和小數

[英]Apache Spark reduceByKey to sum decimals

通過reduceByKey（）或其他函數減少Spark中的工作？

[英]Reduce job in Spark by reduceByKey() or other functions?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Spark：reduce和reduceByKey之間的語義差異 Apache Spark - reducebyKey - Java - Spark減少了reduceByKey中的一些鍵 Spark 或 Flink 中的 reduce、reduceByKey、reduceGroups reduceByKey、groupByKey、aggregateByKey 和 combineByKey 之間的火花差異使用.reduceByKey（）的Apache Spark NoSuchMethodError Apache Spark在reduceByKey步驟上緩慢在Apache Spark（Scala）中使用reduceByKey Apache Spark reduceByKey 求和小數通過reduceByKey（）或其他函數減少Spark中的工作？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM