简体   繁体   中英

python - how to select difference between two dataframes and it's different column

I have two dataframes and df2 is more columns

If the row in df1 doesn't have in df2, I select it to df3

df1

    id  colA colB
0   1   4    1
1   2   5    2
2   3   2    4
3   4   4    2
4   5   2    4

df2

    id  colA colB colC
0   1   4    1    0
1   2   5    2    0
2   5   2    4    0

I want select some rows from df1

df3

    id  colA colB
0   3   2    4
1   4   4    2

Assuming you are comparing on the 'id' column (if not, please clarify), you can use Series.isin with boolean indexing.

>>> df3 = df1[~df1['id'].isin(df2['id'])]
>>> df3
   id  colA  colB
2   3     2     4
3   4     4     2

Use drop_duplicates :

import pandas as pd

df1 = pd.DataFrame({'id': [1,2,3,4,5],
                    'colA':[4,5,2,4,2],
                    'colB':[1,2,4,2,4]})

df2 = pd.DataFrame({'id': [1,2,5],
                    'colA':[4,5,2],
                    'colB':[1,2,4])

pd.concat([df1,df2]).drop_duplicates(subset='id',keep=False)

Output:

   id    colA   colB
2   3    2     4
3   4    4     2
df3 = df1.loc[~df1['id'].isin(list(df2['id']))]

Output:

   id  colA  colB
2   3     2     4
3   4     4     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM