简体   繁体   中英

Subset pandas DataFrame based on two columns in another DataFrame

I have the pandas DataFrame as shown below. df1 and df2 are both subsets of df . I want to now create two new subsets of df , let's call them dftrn and dftst . dftrn should contain all values of df that are in df1 and df2 . dftst should contain values of df that are not in df1 and df2 . What's the quickest way to do this?

df = pd.DataFrame({
    'product': ['prod1', 'prod1', 'prod1', 'prod2', 'prod2', 'prod2'],
    'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-04-01'],
    'value': [5.1, 5.2, 5.4, 2.3, 2.2, 2.4]
})

df1 = pd.DataFrame({
    'product': ['prod1', 'prod1'],
    'date': ['2017-02-01', '2017-03-01'],
    'value': [5.1, 5.4]
})

df2 = pd.DataFrame({
    'product': ['prod2', 'prod2'],
    'date': ['2017-02-01', '2017-04-01'],
    'value': [2.3, 2.4]
})

What I want dftrn and dftst to look like (indices don't matter):

dftrn
    date    product value
0   2017-02-01  prod1   5.2
1   2017-03-01  prod1   5.4
2   2017-02-01  prod2   2.3
3   2017-04-01  prod2   2.4

dftst
date    product value
0   2017-01-01  prod1   5.1
1   2017-03-01  prod2   2.2

Using concat with isin

s=pd.concat([df1,df2])
df[df.apply(tuple,1).isin(s.apply(tuple,1))]
Out[77]: 
         date product  value
2  2017-03-01   prod1    5.4
3  2017-02-01   prod2    2.3
5  2017-04-01   prod2    2.4
df[~df.apply(tuple,1).isin(s.apply(tuple,1))]
Out[78]: 
         date product  value
0  2017-01-01   prod1    5.1
1  2017-02-01   prod1    5.2
4  2017-03-01   prod2    2.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM