[英]How to divide pandas dataframe's value by its first row by each group?
A pandas dataframe : 熊猫数据框:
>>> df
sales net_pft
STK_ID RPT_Date
002138 20140930 3.325 0.607
20150930 3.619 0.738
20160930 4.779 0.948
600004 20140930 13.986 2.205
20150930 14.226 3.080
20160930 15.499 3.619
600660 20140930 31.773 5.286
20150930 31.040 6.333
20160930 40.062 7.186
Just want to know how to get the output as the each row's value is divided by its first row of each group, like this: 只是想知道如何获取输出,因为每一行的值除以每组的第一行,如下所示:
sales net_pft
STK_ID RPT_Date
002138 20140930 1.000 1.000
20150930 1.088 1.216
20160930 1.437 1.562
600004 20140930 1.000 1.000
20150930 1.017 1.397
20160930 1.108 1.641
600660 20140930 1.000 1.000
20150930 0.977 1.198
20160930 1.261 1.359
Thanks, 谢谢,
import pandas as pd
df = pd.DataFrame({'RPT_Date': ['20140930', '20150930', '20160930', '20140930', '20150930', '20160930', '20140930', '20150930', '20160930'], 'STK_ID': ['002138', '002138', '002138', '600004', '600004', '600004', '600660', '600660', '600660'], 'net_pft': [0.607, 0.738, 0.948, 2.205, 3.080, 3.619, 5.286, 6.333, 7.186], 'sales': [3.325, 3.619, 4.779, 13.986, 14.226, 15.499, 31.773, 31.040, 40.062]})
df = df.set_index(['STK_ID','RPT_Date'])
firsts = (df.groupby(level=['STK_ID']).transform('first'))
result = df / firsts
yields 产量
net_pft sales
STK_ID RPT_Date
002138 20140930 1.000000 1.000000
20150930 1.215815 1.088421
20160930 1.561779 1.437293
600004 20140930 1.000000 1.000000
20150930 1.396825 1.017160
20160930 1.641270 1.108180
600660 20140930 1.000000 1.000000
20150930 1.198070 0.976930
20160930 1.359440 1.260882
The main trick above is to use groupby/transform('first')
to create a DataFrame which is the same shape as df
but whose values come from the first row of each group: 上面的主要技巧是使用
groupby/transform('first')
创建一个与df
形状相同的DataFrame,但其值来自每个组的第一行:
firsts = df.groupby(level=['STK_ID']).transform('first')
# net_pft sales
# STK_ID RPT_Date
# 002138 20140930 0.607 3.325
# 20150930 0.607 3.325
# 20160930 0.607 3.325
# 600004 20140930 2.205 13.986
# 20150930 2.205 13.986
# 20160930 2.205 13.986
# 600660 20140930 5.286 31.773
# 20150930 5.286 31.773
# 20160930 5.286 31.773
Although this is a profligate use of memory, this is likely the quickest way to obtain the desired result since it avoids looping through the groups in Python. 尽管这是浪费的内存使用,但这可能是获得所需结果的最快方法,因为它避免了在Python中遍历各个组。
If the above code raises a TypeError: Transform function invalid for data types
in Pandas version 0.13, you could try using this workaround: 如果以上代码引发
TypeError: Transform function invalid for data types
Pandas版本0.13中的TypeError: Transform function invalid for data types
,则可以尝试使用以下解决方法:
result = list()
for key, grp in df.groupby(level=['STK_ID']):
result.append(grp/grp.iloc[0])
result = pd.concat(result)
print(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.