简体   繁体   中英

Python: compare dataframes based on two conditions

I have the following two dataframes:

df1:

date   id 
2000   1
2001   1
2002   2

df2:

date   id 
2000   1
2002   2

I now want to extract a list of observations that are in df1 but not in df2 based on date AND id.

The result should look like this:

date id
2001  1

I know how make a command to compare a column to a list with isin like this:

result = df1[~df1["id"].isin(df2["id"].tolist())]

However, this would only compare the two dataframes based on the column id. Because it could be that the id is in df1 and df2, but for different dates it is important that I only get values where both - id and date- are present in the two dataframes. Does somebody know how to do that?

Using merge

In [795]: (df1.merge(df2, how='left', indicator='_a')
              .query('_a == "left_only"')
              .drop('_a', 1))
Out[795]:
   date  id
1  2001   1

Details

In [796]: df1.merge(df2, how='left', indicator='_a')
Out[796]:
   date  id         _a
0  2000   1       both
1  2001   1  left_only
2  2002   2       both

In [797]: df1.merge(df2, how='left', indicator='_a').query('_a == "left_only"')
Out[797]:
   date  id         _a
1  2001   1  left_only

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM