Pandas dataframe in python: Removing rows from df1 based on rows in df2

Question

I have two dataframes:

df1: contains all information
rowname a  b  c  d
R1      1  2  0  1
R2      2  2  0  1
R3      0  2  0  0
R4      1  2  0  1

df2: contains a subset of the rows and columns:
rowname a  b  c  
R1      1  2  0  
R2      2  2  0   
R4      1  2  0

I want to filter out all the rows df2 that are not in df1 . So for this case, I'm looking to get rid of R3 in df1 while keeping all columns.

I think using df1.merge(df2, ...) could work to make this happen, but I've tried a variety of arguments with no success. I'm using python3.

Answer 1

Simpy filter the dataframe using isin()

df1[df1.rowname.isin(df2.rowname)]

  rowname  a  b  c  d
0      R1  1  2  0  1
1      R2  2  2  0  1
3      R4  1  2  0  1

Answer 2

This is one way, which matches only on columns ['a', 'b', 'c'] .

df = pd.concat([df1, df2])

df = df.loc[df.duplicated(['a', 'b', 'c'], keep=False)]\
       .dropna(subset=['d'], axis=0)

df['d'] = df['d'].astype(int)

Result:

   a  b  c  d rowname
0  1  2  0  1      R1
1  2  2  0  1      R2
3  1  2  0  1      R4

Pandas dataframe in python: Removing rows from df1 based on rows in df2

Question

2 answers

solution1
1 ACCPTED 2018-03-02 22:02:37

solution2
0 2018-03-02 22:03:32

Pandas dataframe in python: Removing rows from df1 based on rows in df2

Question

2 answers

solution1 1 ACCPTED 2018-03-02 22:02:37

solution2 0 2018-03-02 22:03:32

solution1
1 ACCPTED 2018-03-02 22:02:37

solution2
0 2018-03-02 22:03:32