简体   繁体   English

SQL查询/ Spark数据帧到外部联接并减去两个表的值

[英]SQL query/Spark dataframe to outer join and subtract values of two tables

I'm looking to do an outer join on two tables A and B based on 'name', and then subtract the column 'count' values, substituting 0 if the row doesn't exist in the other table. 我希望基于“名称”对两个表A和B进行外部联接,然后减去“计数”列的值,如果该行在另一个表中不存在,则将其替换为0。 Does anyone know a simple SQL query to make this possible? 有谁知道一个简单的SQL查询使之成为可能?

A
name count
ABC 10
DEF 10
GHI 20

B
name count
ABC 20
GHI 30
XYZ 10

RESULT
name count
ABC -10
DEF 10
GHI -10
XYZ -10

Thanks! 谢谢!

(or if there is a way to do this with Spark DataFrames that would be great as well!) (或者,如果有一种方法可以通过Spark DataFrames做到这一点,那就更好了!)

With spark , you can join the two data frames on name column, coalesce null count to zero and then subtract A.count with B.count : 使用spark ,可以将name列上的两个数据框coalesce null计数coalesce为零,然后将A.countB.count相减:

(A.alias("a").join(B.alias("b"), Seq("name"), "outer")
  .selectExpr("name", "coalesce(a.count, 0) - coalesce(b.count, 0) as count")).show
+----+-----+
|name|count|
+----+-----+
| DEF|   10|
| GHI|  -10|
| XYZ|  -10|
| ABC|  -10|
+----+-----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM