[英]Pandas Dataframe group by within function
我有一个 pandas dataframe 股票价格数据如下所示:
ticker date open high low close volume
0 A2M 2015-03-31 0.555 0.595 0.530 0.565 4816294.0
1 A2M 2015-04-30 0.475 0.500 0.475 0.500 531816.0
2 A2M 2015-05-29 0.475 0.475 0.455 0.465 5665854.0
3 A2M 2015-06-30 0.640 0.650 0.630 0.640 1691918.0
4 A2M 2015-07-31 0.750 0.760 0.730 0.735 714927.0
... ... ... ... ... ... ... ...
45479 ZFX 2008-01-31 10.090 10.490 9.860 10.280 4484500.0
45480 ZFX 2008-02-29 10.650 11.130 10.650 11.130 15525073.0
45481 ZFX 2008-03-31 10.010 10.080 9.920 9.980 4256951.0
45482 ZFX 2008-04-30 9.900 10.190 9.850 10.100 3522569.0
45483 ZFX 2008-05-30 9.750 9.750 9.450 9.500 8270995.0
我的目标是在 dataframe 中包含 3、6、9、12 个月变化率的列。 我开发了下面的function:
#defines the ROC function
def roc (df, roc_periods):
roc = df['close'] / df['close'].shift(roc_periods) - 1
return pd.DataFrame(roc)
#defines the periods for the ROC calculations
def roc_periods(df, months):
for month in months:
df['{}mo_roc'.format(month)] = roc(df, month)
return df
#specify the roc periods to calculate
periods = roc_periods(monthly_raw_data, [3, 6, 9, 12])
output dataframe如下:
ticker date open high low close volume 3mo_roc \
0 A2M 2015-03-31 0.555 0.595 0.530 0.565 4816294.0 NaN
1 A2M 2015-04-30 0.475 0.500 0.475 0.500 531816.0 NaN
2 A2M 2015-05-29 0.475 0.475 0.455 0.465 5665854.0 NaN
3 A2M 2015-06-30 0.640 0.650 0.630 0.640 1691918.0 0.132743
4 A2M 2015-07-31 0.750 0.760 0.730 0.735 714927.0 0.470000
... ... ... ... ... ... ... ... ...
45479 ZFX 2008-01-31 10.090 10.490 9.860 10.280 4484500.0 -0.382583
45480 ZFX 2008-02-29 10.650 11.130 10.650 11.130 15525073.0 -0.229224
45481 ZFX 2008-03-31 10.010 10.080 9.920 9.980 4256951.0 -0.195161
45482 ZFX 2008-04-30 9.900 10.190 9.850 10.100 3522569.0 -0.017510
45483 ZFX 2008-05-30 9.750 9.750 9.450 9.500 8270995.0 -0.146451
6mo_roc 9mo_roc 12mo_roc
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
... ... ... ...
45479 -0.483677 -0.378852 -0.373171
45480 -0.340640 -0.367614 -0.334330
45481 -0.436795 -0.469713 -0.367554
45482 -0.393393 -0.492717 -0.389728
45483 -0.342105 -0.437204 -0.460227
问题是我似乎无法让 .groupby() 方法工作。 因此,变化率列在所有代码中滚动,就好像它们是连续的一样,而不是针对每个代码进行计算。 我试图在整个代码中放置.groupby()
方法,但是我收到KeyError: 'ticker'
消息。 出于询问的目的 - 我已经一起删除了我对groupby
的尝试。
您可以将参数传递给在 groupby 之后应用的 function。 只需更改roc_periods
即可使用它:
#defines the periods for the ROC calculations
def roc_periods(df, months):
for month in months:
df['{}mo_roc'.format(month)] = df.groupby('ticker').apply(roc, month)
return df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.