A pandas dataframe :
>>> df
sales net_pft
STK_ID RPT_Date
002138 20140930 3.325 0.607
20150930 3.619 0.738
20160930 4.779 0.948
600004 20140930 13.986 2.205
20150930 14.226 3.080
20160930 15.499 3.619
600660 20140930 31.773 5.286
20150930 31.040 6.333
20160930 40.062 7.186
Just want to know how to get the output as the each row's value is divided by its first row of each group, like this:
sales net_pft
STK_ID RPT_Date
002138 20140930 1.000 1.000
20150930 1.088 1.216
20160930 1.437 1.562
600004 20140930 1.000 1.000
20150930 1.017 1.397
20160930 1.108 1.641
600660 20140930 1.000 1.000
20150930 0.977 1.198
20160930 1.261 1.359
Thanks,
import pandas as pd
df = pd.DataFrame({'RPT_Date': ['20140930', '20150930', '20160930', '20140930', '20150930', '20160930', '20140930', '20150930', '20160930'], 'STK_ID': ['002138', '002138', '002138', '600004', '600004', '600004', '600660', '600660', '600660'], 'net_pft': [0.607, 0.738, 0.948, 2.205, 3.080, 3.619, 5.286, 6.333, 7.186], 'sales': [3.325, 3.619, 4.779, 13.986, 14.226, 15.499, 31.773, 31.040, 40.062]})
df = df.set_index(['STK_ID','RPT_Date'])
firsts = (df.groupby(level=['STK_ID']).transform('first'))
result = df / firsts
yields
net_pft sales
STK_ID RPT_Date
002138 20140930 1.000000 1.000000
20150930 1.215815 1.088421
20160930 1.561779 1.437293
600004 20140930 1.000000 1.000000
20150930 1.396825 1.017160
20160930 1.641270 1.108180
600660 20140930 1.000000 1.000000
20150930 1.198070 0.976930
20160930 1.359440 1.260882
The main trick above is to use groupby/transform('first')
to create a DataFrame which is the same shape as df
but whose values come from the first row of each group:
firsts = df.groupby(level=['STK_ID']).transform('first')
# net_pft sales
# STK_ID RPT_Date
# 002138 20140930 0.607 3.325
# 20150930 0.607 3.325
# 20160930 0.607 3.325
# 600004 20140930 2.205 13.986
# 20150930 2.205 13.986
# 20160930 2.205 13.986
# 600660 20140930 5.286 31.773
# 20150930 5.286 31.773
# 20160930 5.286 31.773
Although this is a profligate use of memory, this is likely the quickest way to obtain the desired result since it avoids looping through the groups in Python.
If the above code raises a TypeError: Transform function invalid for data types
in Pandas version 0.13, you could try using this workaround:
result = list()
for key, grp in df.groupby(level=['STK_ID']):
result.append(grp/grp.iloc[0])
result = pd.concat(result)
print(result)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.