I have a pandas dataframe
df
store day items
a 1 4
a 1 3
a 2 1
a 3 5
a 4 2
a 5 9
b 1 1
b 2 3
I have another pandas dataframe temp
that is the kronecker product of all unique store-day combinations, that is, it looks like:
store day
0 a 1
1 a 2
2 a 3
3 a 4
4 a 5
5 b 1
6 b 2
7 b 3
8 b 4
9 b 5
I want to make a new DF that is the missing observations in df
, that is, the store-day
combinations not present in df
but present in temp
.
desired output
store day
b 3
b 4
b 5
This is one way
gcols = ['store', 'date']
tmp[tmp.set_index(gcols).index.isin(df.set_index(gcols).index) == False]
My solution merges the two dataframes and uses items
as a marker column - it will be nan
for the rows we want. I believe that for large dataframes this would be more efficient than the alternative using isin
. Had items
not be there, I would've added a marker column to df
.
So first the merge. It's important to specify how = 'left'
so we'd get the rows from tmp
that are not on df
:
out = tmp.merge(df, on= ['store', 'day'], how = 'left')
In [23]: out
Out[23]:
store day items
0 a 1 4
1 a 1 3
2 a 2 1
3 a 3 5
4 a 4 2
5 a 5 9
6 b 1 1
7 b 2 3
8 b 3 NaN
9 b 4 NaN
10 b 5 NaN
You see that the rows we want received nan
for their items
column, since they were merged only from tmp
. Now let's drop them, and get rid of the marker column.
out[out['items'].isnull()].drop(['items'], axis = 1)
store day
8 b 3
9 b 4
10 b 5
newDF = pd.merge(df,temp,how='right',on=['store','day'])
newDF[newDF.isnull().any(axis=1)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.