How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df

Question

I have two different Pandas data-frames that have one column in common. I have seen similar questions on Stack overflow but none that seem to end up with the columns from both dataframes so please read below before marking as duplicate.

Example:

dataframe 1

ID  col1 col2  ...
1    9    5
2    8    4
3    7    3 
4    6    2

dataframe 2

ID  col3  col4  ...
3    11     15
4    12     16
7    13     17

What I want to achieve is a dataframe with columns from both dataframes but without the ID's found in dataframe2. ie:

desired result:

ID  col1 col2  col3  col4
1    9    5     -     -
2    8    4     -     -

Thanks!

Answer 1

You can use a left join to get only the id 's in the first data frame and not the second data frame while also keeping all the second data frames columns.

import pandas as pd

df1 = pd.DataFrame(
    data={"id": [1, 2, 3, 4], "col1": [9, 8, 7, 6], "col2": [5, 4, 3, 2]},
    columns=["id", "col1", "col2"],
)
df2 = pd.DataFrame(
    data={"id": [3, 4, 7], "col3": [11, 12, 13], "col4": [15, 16, 17]},
    columns=["id", "col3", "col4"],
)

df_1_2 = df1.merge(df2, on="id", how="left", indicator=True)

df_1_not_2 = df_1_2[df_1_2["_merge"] == "left_only"].drop(columns=["_merge"])

which returns

   id  col1  col2  col3  col4
0   1     9     5   NaN   NaN
1   2     8     4   NaN   NaN

Answer 2

Looks like a simple drop will work for what you want:

df1.drop(df2.index, errors='ignore', axis=0)

     col1  col2
ID            
1      9     5
2      8     4

Note that this assumes that ID is the index, otherwise use .isin :

df1[~df1.ID.isin(df2.ID)]

    ID  col1  col2
0   1     9     5
1   2     8     4

How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df

Question

2 answers

solution1
2 ACCPTED 2019-01-16 14:34:09

solution2
1 2019-01-16 14:22:16

How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df

Question

2 answers

solution1 2 ACCPTED 2019-01-16 14:34:09

solution2 1 2019-01-16 14:22:16

solution1
2 ACCPTED 2019-01-16 14:34:09

solution2
1 2019-01-16 14:22:16