简体   繁体   中英

Constructing pandas dataframe with rows conditional on their not existing in another dataframe python

I have a pandas dataframe

df
store    day   items
 a        1     4
 a        1     3
 a        2     1
 a        3     5
 a        4     2 
 a        5     9
 b        1     1 
 b        2     3

I have another pandas dataframe temp that is the kronecker product of all unique store-day combinations, that is, it looks like:

    store  day  
0     a    1     
1     a    2      
2     a    3      
3     a    4      
4     a    5      
5     b    1      
6     b    2      
7     b    3    
8     b    4    
9     b    5    

I want to make a new DF that is the missing observations in df , that is, the store-day combinations not present in df but present in temp .

desired output


store    day
b         3      
b         4       
b         5      

This is one way

gcols = ['store', 'date']
tmp[tmp.set_index(gcols).index.isin(df.set_index(gcols).index) == False]

My solution merges the two dataframes and uses items as a marker column - it will be nan for the rows we want. I believe that for large dataframes this would be more efficient than the alternative using isin . Had items not be there, I would've added a marker column to df .

So first the merge. It's important to specify how = 'left' so we'd get the rows from tmp that are not on df :

out = tmp.merge(df, on= ['store', 'day'], how = 'left')

In [23]: out
Out[23]: 
   store  day  items
0      a    1      4
1      a    1      3
2      a    2      1
3      a    3      5
4      a    4      2
5      a    5      9
6      b    1      1
7      b    2      3
8      b    3    NaN
9      b    4    NaN
10     b    5    NaN

You see that the rows we want received nan for their items column, since they were merged only from tmp . Now let's drop them, and get rid of the marker column.

out[out['items'].isnull()].drop(['items'], axis = 1)

   store  day
8      b    3
9      b    4
10     b    5

newDF = pd.merge(df,temp,how='right',on=['store','day'])

newDF[newDF.isnull().any(axis=1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM