繁体   English   中英

Spark Scala对两个数据框中的列进行一些数学运算

[英]Spark Scala Make some mathematical operations over columns in two dataframe

我们有两个数据框:

+----+--------+--------+
| id | height | weight |
+----+--------+--------+
|  1 |    12  |      5 |
|  2 |      8 |      7 |
+----+--------+--------+

和另一个 :

+----+------+--------+
| id | Area | Area_2 |
+----+------+--------+
|  1 |      |        |
|  2 |      |        |
+----+------+--------+

我需要通过id将两个数据帧连接起来,并导致第二个df必须是这样的:

+----+---------+--------+
| id |  Area   | Area_2 |
+----+---------+--------+
|  1 | (12*5)  | (12+5) |
|  2 | (8*7)   | (8+7)  |
+----+---------+--------+

(其中Area和Area_2需要具有来自另一个DF的高度和重量之间的运算结果,并由id连接)

试试下面的代码

  val df1 = Seq((1, 12, 5), (2,8,7)).toDF("id", "height", "weight")
  val df2 = Seq((1),(2)).toDF("id") // drop area and area2 columns from this df
  val df3 = df1.withColumn("area", col("height")* col("weight")).withColumn("area2", col("height") + col("weight"))
  val finaldf = df2.join(df3, Seq("id"))

  scala> finaldf.show
  +---+------+------+----+-----+
  | id|height|weight|area|area2|
  +---+------+------+----+-----+
  |  1|    12|     5|  60|   17|
  |  2|     8|     7|  56|   15|
  +---+------+------+----+-----+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM