[英]SQL query/Spark dataframe to outer join and subtract values of two tables
I'm looking to do an outer join on two tables A and B based on 'name', and then subtract the column 'count' values, substituting 0 if the row doesn't exist in the other table. 我希望基于“名称”对两个表A和B进行外部联接,然后减去“计数”列的值,如果该行在另一个表中不存在,则将其替换为0。 Does anyone know a simple SQL query to make this possible? 有谁知道一个简单的SQL查询使之成为可能?
A
name count
ABC 10
DEF 10
GHI 20
B
name count
ABC 20
GHI 30
XYZ 10
RESULT
name count
ABC -10
DEF 10
GHI -10
XYZ -10
Thanks! 谢谢!
(or if there is a way to do this with Spark DataFrames that would be great as well!) (或者,如果有一种方法可以通过Spark DataFrames做到这一点,那就更好了!)
With spark
, you can join the two data frames on name
column, coalesce
null
count to zero and then subtract A.count
with B.count
: 使用spark
,可以将name
列上的两个数据框coalesce
null
计数coalesce
为零,然后将A.count
与B.count
相减:
(A.alias("a").join(B.alias("b"), Seq("name"), "outer")
.selectExpr("name", "coalesce(a.count, 0) - coalesce(b.count, 0) as count")).show
+----+-----+
|name|count|
+----+-----+
| DEF| 10|
| GHI| -10|
| XYZ| -10|
| ABC| -10|
+----+-----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.