How can I perform filter operation on Dataframe1 using Dataframe2. I want to remove rows from DataFrame1 for below matching condition
Dataframe1.col1 = Dataframe2.col1
Dataframe1.col2 = Dataframe2.col2
My question is different than substract two dataframes because while substract we use all columns but in my question I want to use limited number of columns
join with " left_anti "
scala> df1.show
+----+-----+-----+
|col1| col2| col3|
+----+-----+-----+
| 1| one| ek|
| 2| two| dho|
| 3|three|theen|
| 4| four|chaar|
+----+-----+-----+
scala> df2.show
+----+----+-----+
|col1|col2| col3|
+----+----+-----+
| 2| two| dho|
| 4|four|chaar|
+----+----+-----+
scala> df1.join(df2, Seq("col1", "col2"), "left_anti").show
+----+-----+-----+
|col1| col2| col3|
+----+-----+-----+
| 1| one| ek|
| 3|three|theen|
+----+-----+-----+
Possible duplicate of : Spark: subtract two DataFrames if both datasets have exact same coulmns
If you want custom join condition then you can use "anti" join. Here is the pysaprk version
Creating two data frames:
Dataframe1 :
l1 = [('col1_row1', 10), ('col1_row2', 20), ('col1_row3', 30)
df1 = spark.createDataFrame(l1).toDF('col1','col2')
df1.show()
+---------+----+
| col1|col2|
+---------+----+
|col1_row1| 10|
|col1_row2| 20|
|col1_row3| 30|
+---------+----+
Dataframe2 :
l2 = [('col1_row1', 10), ('col1_row2', 20), ('col1_row4', 40)]
df2 = spark.createDataFrame(l2).toDF('col1','col2')
df2.show()
+---------+----+
| col1|col2|
+---------+----+
|col1_row1| 10|
|col1_row2| 20|
|col1_row4| 40|
+---------+----+
Using subtract api :
df_final = df1.subtract(df2)
df_final.show()
+---------+----+
| col1|col2|
+---------+----+
|col1_row3| 30|
+---------+----+
Using left_anti :
Join condition:
join_condition = [df1["col1"] == df2["col1"], df1["col2"] == df2["col2"]]
Join finally
df_final = df1.join(df2, join_condition, 'left_anti')
df_final.show()
+---------+----+
| col1|col2|
+---------+----+
|col1_row3| 30|
+---------+----+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.