I have a dataframe :
+++++++++++++++++++++++
| Col1 | col2 |
|+++++++++++++++++++++ |
| A | A2 |
| A | A2 |
| B | b2
| B | b2 |
| C | c2 |
| D | d2 |
| E | e2 |
| F | f2 |
And another dataframe
+++++++++++++++++++++++
| Col1 | col2 |
|+++++++++++++++++++++ |
| A | A2 |
| B | b2 |
| C | c2 |
I want have in result :
+++++++++++++++++++++++
| Col1 | col2 |
|+++++++++++++++++++++ |
| D | d2 |
| E | e2 |
| F | f2 |
I do that :
df1.join(df2,Seq("col1","col2"),"left")
But doesn't work for me .
Any idea ? Thank you .
We can use .except
or leftjoin
for this case.
Example:
df.show()
//+----+----+
//|Col1|Col2|
//+----+----+
//| A| A2|
//| A| A2|
//| B| b2|
//| B| b2|
//| C| c2|
//| D| d2|
//| E| e2|
//| F| f2|
//+----+----+
df1.show()
//+----+----+
//|Col1|Col2|
//+----+----+
//| A| A2|
//| B| b2|
//| C| c2|
//+----+----+
df.except(df1).show()
//+----+----+
//|Col1|Col2|
//+----+----+
//| E| e2|
//| F| f2|
//| D| d2|
//+----+----+
df.alias("d1").join(df1.alias("d2"),
(col("d1.Col1")===col("d2.Col1") &&(col("d1.Col2")===col("d2.Col2"))),"left").
filter(col("d2.Col2").isNull).
select("d1.*").
show()
//+----+----+
//|Col1|Col2|
//+----+----+
//| D| d2|
//| E| e2|
//| F| f2|
//+----+----+
You can use except on both the df.
scala> df1.except(df2).show
+----+----+
|Col1|col2|
+----+----+
| E| e2|
| F| f2|
| D| d2|
+----+----+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.