简体   繁体   中英

How to diff two table using spark sql?

Now I need to diff two table using spark sql,i find a sql server's answer like this :

(SELECT *
 FROM   table1
 EXCEPT
 SELECT *
 FROM   table2)
UNION ALL
(SELECT *
 FROM   table2
 EXCEPT
 SELECT *
 FROM   table1) 

Hope somebody can tell me how to using spark sql like this in sql server? (Do not care the special col ,just use *)

You can do this something like this :

scala> val df1=sc.parallelize(Seq((1,2),(3,4))).toDF("a","b")
df1: org.apache.spark.sql.DataFrame = [a: int, b: int]

scala> val df2=sc.parallelize(Seq((1,2),(5,6))).toDF("a","b")
df2: org.apache.spark.sql.DataFrame = [a: int, b: int]

scala> df1.create
createOrReplaceTempView   createTempView

scala> df1.createTempView("table1")

scala> df2.createTempView("table2")

scala> spark.sql("select * from table1 EXCEPT select * from table2").show
+---+---+                                                                       
|  a|  b|
+---+---+
|  3|  4|
+---+---+


scala> spark.sql("(select * from table2 EXCEPT select * from table1) UNION ALL (select * from table1 EXCEPT select * from table2)").show
+---+---+                                                                       
|  a|  b|
+---+---+
|  5|  6|
|  3|  4|
+---+---+

Note : In your case you have to make dataframe out of the JDBC calls and then register the table and perform the operations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM