简体   繁体   中英

Pandas dataframe in python: Removing rows from df1 based on rows in df2

I have two dataframes:

df1: contains all information
rowname a  b  c  d
R1      1  2  0  1
R2      2  2  0  1
R3      0  2  0  0
R4      1  2  0  1

df2: contains a subset of the rows and columns:
rowname a  b  c  
R1      1  2  0  
R2      2  2  0   
R4      1  2  0 

I want to filter out all the rows df2 that are not in df1 . So for this case, I'm looking to get rid of R3 in df1 while keeping all columns.

I think using df1.merge(df2, ...) could work to make this happen, but I've tried a variety of arguments with no success. I'm using python3.

Simpy filter the dataframe using isin()

df1[df1.rowname.isin(df2.rowname)]

  rowname  a  b  c  d
0      R1  1  2  0  1
1      R2  2  2  0  1
3      R4  1  2  0  1

This is one way, which matches only on columns ['a', 'b', 'c'] .

df = pd.concat([df1, df2])

df = df.loc[df.duplicated(['a', 'b', 'c'], keep=False)]\
       .dropna(subset=['d'], axis=0)

df['d'] = df['d'].astype(int)

Result:

   a  b  c  d rowname
0  1  2  0  1      R1
1  2  2  0  1      R2
3  1  2  0  1      R4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM