简体   繁体   English

Spark Scala连接数据框减去列值

[英]Spark Scala join dataframe subtract column values

I have two dataframes each has 2 columns. 我有两个数据框,每个数据框都有2列。 I want to join them by their 1st column and subtract their 2nd columns. 我想按他们的第一列加入他们,减去他们的第二列。 Here's what I have so far: 这是我到目前为止的内容:

var x = df.select("a", "c")
          .groubBy("a")
          .count()
var y = df.select("b", "c")
          .groubBy("b")
          .count()
var z = x.join(y, x("a") === y("b"))

How do I perform a dataframe subtraction? 如何执行数据框减法? Without the dataframe, I usually to mapValues{case ..=> ..}. 如果没有数据框,我通常会使用mapValues {case .. => ..}。 Thanks 谢谢

val x = df.groubBy("a")
          .agg(sum("c").as("c1"))
          .select("a", "c1")
val y = df.groubBy("b")
          .agg(sum("c").as("c2"))
          .select("b", "c2")
val z = x.join(y, $"a" === $"b")
         .select($"a", $"c1" - $"c2")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM