I have the following two dataframes:
df1:
date id
2000 1
2001 1
2002 2
df2:
date id
2000 1
2002 2
I now want to extract a list of observations that are in df1 but not in df2 based on date AND id.
The result should look like this:
date id
2001 1
I know how make a command to compare a column to a list with isin like this:
result = df1[~df1["id"].isin(df2["id"].tolist())]
However, this would only compare the two dataframes based on the column id. Because it could be that the id is in df1 and df2, but for different dates it is important that I only get values where both - id and date- are present in the two dataframes. Does somebody know how to do that?
Using merge
In [795]: (df1.merge(df2, how='left', indicator='_a')
.query('_a == "left_only"')
.drop('_a', 1))
Out[795]:
date id
1 2001 1
Details
In [796]: df1.merge(df2, how='left', indicator='_a')
Out[796]:
date id _a
0 2000 1 both
1 2001 1 left_only
2 2002 2 both
In [797]: df1.merge(df2, how='left', indicator='_a').query('_a == "left_only"')
Out[797]:
date id _a
1 2001 1 left_only
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.