Now I need to diff two table using spark sql,i find a sql server's answer like this :
(SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2)
UNION ALL
(SELECT *
FROM table2
EXCEPT
SELECT *
FROM table1)
Hope somebody can tell me how to using spark sql like this in sql server? (Do not care the special col ,just use *)
You can do this something like this :
scala> val df1=sc.parallelize(Seq((1,2),(3,4))).toDF("a","b")
df1: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> val df2=sc.parallelize(Seq((1,2),(5,6))).toDF("a","b")
df2: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> df1.create
createOrReplaceTempView createTempView
scala> df1.createTempView("table1")
scala> df2.createTempView("table2")
scala> spark.sql("select * from table1 EXCEPT select * from table2").show
+---+---+
| a| b|
+---+---+
| 3| 4|
+---+---+
scala> spark.sql("(select * from table2 EXCEPT select * from table1) UNION ALL (select * from table1 EXCEPT select * from table2)").show
+---+---+
| a| b|
+---+---+
| 5| 6|
| 3| 4|
+---+---+
Note : In your case you have to make dataframe out of the JDBC calls and then register the table and perform the operations.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.