[英]Pandas Dataframe group by within function
I have a pandas dataframe with stock price data shown below:我有一个 pandas dataframe 股票价格数据如下所示:
ticker date open high low close volume
0 A2M 2015-03-31 0.555 0.595 0.530 0.565 4816294.0
1 A2M 2015-04-30 0.475 0.500 0.475 0.500 531816.0
2 A2M 2015-05-29 0.475 0.475 0.455 0.465 5665854.0
3 A2M 2015-06-30 0.640 0.650 0.630 0.640 1691918.0
4 A2M 2015-07-31 0.750 0.760 0.730 0.735 714927.0
... ... ... ... ... ... ... ...
45479 ZFX 2008-01-31 10.090 10.490 9.860 10.280 4484500.0
45480 ZFX 2008-02-29 10.650 11.130 10.650 11.130 15525073.0
45481 ZFX 2008-03-31 10.010 10.080 9.920 9.980 4256951.0
45482 ZFX 2008-04-30 9.900 10.190 9.850 10.100 3522569.0
45483 ZFX 2008-05-30 9.750 9.750 9.450 9.500 8270995.0
My goal is to include columns within the dataframe for the 3,6,9,12 month rate of change.我的目标是在 dataframe 中包含 3、6、9、12 个月变化率的列。 I have developed the function below:我开发了下面的function:
#defines the ROC function
def roc (df, roc_periods):
roc = df['close'] / df['close'].shift(roc_periods) - 1
return pd.DataFrame(roc)
#defines the periods for the ROC calculations
def roc_periods(df, months):
for month in months:
df['{}mo_roc'.format(month)] = roc(df, month)
return df
#specify the roc periods to calculate
periods = roc_periods(monthly_raw_data, [3, 6, 9, 12])
The output dataframe is as follows: output dataframe如下:
ticker date open high low close volume 3mo_roc \
0 A2M 2015-03-31 0.555 0.595 0.530 0.565 4816294.0 NaN
1 A2M 2015-04-30 0.475 0.500 0.475 0.500 531816.0 NaN
2 A2M 2015-05-29 0.475 0.475 0.455 0.465 5665854.0 NaN
3 A2M 2015-06-30 0.640 0.650 0.630 0.640 1691918.0 0.132743
4 A2M 2015-07-31 0.750 0.760 0.730 0.735 714927.0 0.470000
... ... ... ... ... ... ... ... ...
45479 ZFX 2008-01-31 10.090 10.490 9.860 10.280 4484500.0 -0.382583
45480 ZFX 2008-02-29 10.650 11.130 10.650 11.130 15525073.0 -0.229224
45481 ZFX 2008-03-31 10.010 10.080 9.920 9.980 4256951.0 -0.195161
45482 ZFX 2008-04-30 9.900 10.190 9.850 10.100 3522569.0 -0.017510
45483 ZFX 2008-05-30 9.750 9.750 9.450 9.500 8270995.0 -0.146451
6mo_roc 9mo_roc 12mo_roc
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
... ... ... ...
45479 -0.483677 -0.378852 -0.373171
45480 -0.340640 -0.367614 -0.334330
45481 -0.436795 -0.469713 -0.367554
45482 -0.393393 -0.492717 -0.389728
45483 -0.342105 -0.437204 -0.460227
The problem is that I cannot seem to get the.groupby() method to work.问题是我似乎无法让 .groupby() 方法工作。 As a result, the rate of change columns roll through all tickers as if they were continuous, rather than calculate for each ticker.因此,变化率列在所有代码中滚动,就好像它们是连续的一样,而不是针对每个代码进行计算。 I've tried to place the .groupby()
method throughout the code, however I receive KeyError: 'ticker'
messages.我试图在整个代码中放置.groupby()
方法,但是我收到KeyError: 'ticker'
消息。 For the purposes of asking on here - I've removed my attempts at groupby
all together.出于询问的目的 - 我已经一起删除了我对groupby
的尝试。
You can pass parameters to a function that you apply after a groupby.您可以将参数传递给在 groupby 之后应用的 function。 Just change roc_periods
to use it:只需更改roc_periods
即可使用它:
#defines the periods for the ROC calculations
def roc_periods(df, months):
for month in months:
df['{}mo_roc'.format(month)] = df.groupby('ticker').apply(roc, month)
return df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.