[英]Spark Scala Make some mathematical operations over columns in two dataframe
我们有两个数据框:
+----+--------+--------+
| id | height | weight |
+----+--------+--------+
| 1 | 12 | 5 |
| 2 | 8 | 7 |
+----+--------+--------+
和另一个 :
+----+------+--------+
| id | Area | Area_2 |
+----+------+--------+
| 1 | | |
| 2 | | |
+----+------+--------+
我需要通过id将两个数据帧连接起来,并导致第二个df必须是这样的:
+----+---------+--------+
| id | Area | Area_2 |
+----+---------+--------+
| 1 | (12*5) | (12+5) |
| 2 | (8*7) | (8+7) |
+----+---------+--------+
(其中Area和Area_2需要具有来自另一个DF的高度和重量之间的运算结果,并由id连接)
试试下面的代码
val df1 = Seq((1, 12, 5), (2,8,7)).toDF("id", "height", "weight")
val df2 = Seq((1),(2)).toDF("id") // drop area and area2 columns from this df
val df3 = df1.withColumn("area", col("height")* col("weight")).withColumn("area2", col("height") + col("weight"))
val finaldf = df2.join(df3, Seq("id"))
scala> finaldf.show
+---+------+------+----+-----+
| id|height|weight|area|area2|
+---+------+------+----+-----+
| 1| 12| 5| 60| 17|
| 2| 8| 7| 56| 15|
+---+------+------+----+-----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.