Column comparison in spark scala

Question

I have 2 dataframes like this.

scala> df1.show

+---+---------+
| ID|    Count|
+---+---------+
|  1|20.565656|
|  2|30.676776|
+---+---------+

scala> df2.show

+---+-----------+
| ID|      Count|
+---+-----------+
|  1|10.00998787|
|  2|    40.7767|
+---+-----------+

How can i take take the max of the column-count after join?

Expected output.

+---+---------+
| id|    Count|      
+---+---------+
|  1|20.565656|
|  2|40.7767  |    
+---+---------+

Answer 1

After joining both dataframes, create an UDF with 2 count columns as input and in the UDF return the greatest value between those columns.

Always its a good practice to use UDF when we need to derive a single column based on multiple columns.

Answer 2

You can do this:

df1.union(df2).groupBy("ID").max("Count").show()

+---+----------+
| ID|max(Count)|
+---+----------+
|  1| 20.565656|
|  2|   40.7767|
+---+----------+

Answer 3

scala> df.show()
+---+---------+
| ID|    Count|
+---+---------+
|  1|20.565656|
|  2|30.676776|
+---+---------+


scala> df1.show()
+---+-----------+
| ID|      Count|
+---+-----------+
|  1|10.00998787|
|  2|    40.7767|
+---+-----------+


scala> df.alias("x").join(df1.alias("y"), List("ID"))
                    .select(col("ID"), col("x.count").alias("Xcount"),col("y.count").alias("Ycount"))
                    .withColumn("Count", when(col("Xcount") >= col("Ycount"), col("Xcount")).otherwise(col("Ycount")))
                    .drop("Xcount", "YCount")
                    .show()
+---+---------+
| ID|    Count|
+---+---------+
|  1|20.565656|
|  2|  40.7767|
+---+---------+

Column comparison in spark scala

Question

3 answers

solution1
1 2020-01-29 17:42:26

solution2
1 2020-01-29 17:44:47

solution3
0 ACCPTED 2020-01-29 18:03:24

Column comparison in spark scala

Question

3 answers

solution1 1 2020-01-29 17:42:26

solution2 1 2020-01-29 17:44:47

solution3 0 ACCPTED 2020-01-29 18:03:24

solution1
1 2020-01-29 17:42:26

solution2
1 2020-01-29 17:44:47

solution3
0 ACCPTED 2020-01-29 18:03:24