简体   繁体   中英

Pandas Groupby.diff fill missing rows with zeros

I am sure this is posted somewhere, or so simple I don't see it, but I have had no luck finding a posting. Any help would be greatly appropriated.

I am trying to do a groupby.diff as you can see. Where dates are missing I need to show a negative value.

df['delta'] = df.groupby(['ID', 'ticker', 'date'])['shares'].diff()

ID  ticker date         shares  delta
A   AAA    3/31/2012    904180  675010
A   AAA    12/31/2011   229170  NaN
A   BBB    3/31/2012    517756  390117
A   BBB    12/31/2011   127639  NaN
A   CCC    12/31/2011   1757    NaN
A   DDD    12/31/2011   500     NaN
B   AAA    3/31/2012    920920  554920
B   AAA   12/31/2011    366000  NaN
B   BBB    3/31/2012    524     393
B   BBB   12/31/2011    131     NaN

I think I need to pad/fill to get this:

ID  ticker date         shares  delta
A   AAA    3/31/2012    904180  675010
A   AAA    12/31/2011   229170  NaN
A   BBB    3/31/2012    517756  390117
A   BBB    12/31/2011   127639  NaN
A   CCC    3/31/2012    0       -1757
A   CCC    12/31/2011   1757    NaN
A   DDD    3/31/2012    0       -500
A   DDD    12/31/2011   500     NaN
B   AAA    3/31/2012    920920  554920
B   AAA   12/31/2011    366000  NaN
B   BBB    3/31/2012    524     393
B   BBB   12/31/2011    131     NaN

Thanks Again,

Using unstack + stack

New_df=df.set_index(['ID','ticker','date']).unstack('date').stack(dropna=False).reset_index().fillna(0)
New_df['delta'] = New_df.groupby(['ID', 'ticker', 'date'])['shares'].diff()

# you should not groupby date, it will return all NaN after you did diff
New_df['delta'] = New_df.groupby(['ID', 'ticker'])['shares'].diff()
#New_df['delta'] = New_df.groupby(['ID', 'ticker','date'])['shares'].diff()
New_df
Out[316]: 
   ID ticker        date    shares     delta
0   A    AAA  12/31/2011  229170.0       NaN
1   A    AAA   3/31/2012  904180.0  675010.0
2   A    BBB  12/31/2011  127639.0       NaN
3   A    BBB   3/31/2012  517756.0  390117.0
4   A    CCC  12/31/2011    1757.0       NaN
5   A    CCC   3/31/2012       0.0   -1757.0
6   A    DDD  12/31/2011     500.0       NaN
7   A    DDD   3/31/2012       0.0    -500.0
8   B    AAA  12/31/2011  366000.0       NaN
9   B    AAA   3/31/2012  920920.0  554920.0
10  B    BBB  12/31/2011     131.0       NaN
11  B    BBB   3/31/2012     524.0     393.0

After sort

New_df.sort_values(['ID','ticker','date'],ascending=[True,True,False])
Out[318]: 
   ID ticker        date    shares     delta
1   A    AAA   3/31/2012  904180.0  675010.0
0   A    AAA  12/31/2011  229170.0       NaN
3   A    BBB   3/31/2012  517756.0  390117.0
2   A    BBB  12/31/2011  127639.0       NaN
5   A    CCC   3/31/2012       0.0   -1757.0
4   A    CCC  12/31/2011    1757.0       NaN
7   A    DDD   3/31/2012       0.0    -500.0
6   A    DDD  12/31/2011     500.0       NaN
9   B    AAA   3/31/2012  920920.0  554920.0
8   B    AAA  12/31/2011  366000.0       NaN
11  B    BBB   3/31/2012     524.0     393.0
10  B    BBB  12/31/2011     131.0       NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM