在 Spark RDD 中寻找最大值

Question

From the following, how can I get the tuple with the highest value?从以下内容中，如何获得具有最高值的元组？

Array[(String, Int)] = Array((a,30),(b,50),(c,20))

In this example the result I want would be (b,50)在这个例子中，我想要的结果是(b,50)

Answer 1

You could use reduce() :您可以使用reduce() ：

val max_tuple = rdd.reduce((acc,value) => { 
  if(acc._2 < value._2) value else acc})
//max_tuple: (String, Int) = (b,50)

Data数据

val rdd = sc.parallelize(Array(("a",30),("b",50),("c",20)))

Answer 2

If the elements are always tuples of two elements you could simply:如果元素总是两个元素的元组，你可以简单地：

Array((a,30),(b,50),(c,20)).maxBy(_._2)

As specified in the docs .如文档中所述。

Answer 3

If you are new to spark, I should tell you that you have to use Dataframe s as much as possible, they have a lot of advantages comparing with RDD s, with Dataframe s you can get the max like this:如果您是 spark 新手，我应该告诉您，您必须尽可能多地使用Dataframe s，它们与RDD s 相比有很多优势，使用Dataframe s 您可以获得最大值，如下所示：

import spark.implicits._
import org.apache.spark.sql.functions.max
val df = Seq(("a",30),("b",50),("c",20)).toDF("x", "y")
val x = df.sort($"y".desc).first()

Disclaimer: as @Mandy007 noted in the comments, this solution is more computationally expensive speaking because it must be ordered免责声明：正如@Mandy007 在评论中指出的，这个解决方案的计算成本更高，因为它必须被订购

This should work, it works for me at least.这应该有效，至少对我有效。 hope this helps you.希望这对你有帮助。

Answer 4

reduce() returns wrong result for me. reduce()为我返回错误的结果。 There are some other options:还有一些其他选择：

val maxTemp2 = rdd.max()(Ordering[Int].on(x=>x._2))
val maxTemp3 = rdd.sortBy[Int](x=>x._2).take(1)(0)

Data数据

val rdd = sc.parallelize(Array(("a",30),("b",50),("c",20)))

Answer 5

rdd.reduceByKey((a,b)=>a+b).collect.maxBy(_._2) rdd.reduceByKey((a,b)=>a+b).collect.maxBy(_._2)

we can use maxBy on collect like this我们可以像这样在 collect 上使用 maxBy

在 Spark RDD 中寻找最大值

问题描述

5 个解决方案

解决方案1
7 已采纳 2017-05-18 11:25:58

解决方案2
2 2017-05-18 10:45:29

解决方案3
2 2017-05-18 15:51:00

解决方案4
1 2017-08-15 19:00:54

解决方案5
0 2017-12-04 17:15:22

在 Spark RDD 中寻找最大值

问题描述

5 个解决方案

解决方案1 7 已采纳 2017-05-18 11:25:58

解决方案2 2 2017-05-18 10:45:29

解决方案3 2 2017-05-18 15:51:00

解决方案4 1 2017-08-15 19:00:54

解决方案5 0 2017-12-04 17:15:22

解决方案1
7 已采纳 2017-05-18 11:25:58

解决方案2
2 2017-05-18 10:45:29

解决方案3
2 2017-05-18 15:51:00

解决方案4
1 2017-08-15 19:00:54

解决方案5
0 2017-12-04 17:15:22