[英]Pandas rolling weighted average
I want to apply a weighted rolling average to a large timeseries, set up as a pandas dataframe, where the weights are different for each day.我想将加权滚动平均值应用于大型时间序列,设置为熊猫数据框,其中每天的权重都不同。 Here's a subset of the dataframe
这是数据框的一个子集
DF:东风:
Date v_std vertical
2010-10-01 1.909 545.231
2010-10-02 1.890 538.610
2010-10-03 1.887 542.759
2010-10-04 1.942 545.221
2010-10-05 1.847 536.832
2010-10-06 1.884 538.858
2010-10-07 1.864 538.017
2010-10-08 1.833 540.737
2010-10-09 1.847 537.906
2010-10-10 1.881 538.210
2010-10-11 1.868 544.238
2010-10-12 1.856 534.878
I want to take a rolling average of the vertical column using the v_std as the weights.我想使用 v_std 作为权重来获取垂直列的滚动平均值。 I've been using the weighted average function:
我一直在使用加权平均函数:
def wavg(group, avg_name, weight_name):
d = group[avg_name]
w = group[weight_name]
try:
return (d * w).sum() / w.sum()
except ZeroDivisionError:
return d.mean()
But I can't figure out how to implement this for a rolling weighted average.但我不知道如何为滚动加权平均值实现这一点。 I assume it is similar to
我认为它类似于
df.rolling(window = 7).apply(wavg, "vertical", "v_std")
or utilizing rolling_apply?或利用rolling_apply? Or will I have to write a new function all together?
还是我必须一起编写一个新函数? Thank you!
谢谢!
Here is my solution for rolling weighted average, using pandas _Rolling_and_Expanding
:这是我使用 pandas
_Rolling_and_Expanding
滚动加权平均的解决方案:
First, I've added new column for the multiplication:首先,我为乘法添加了新列:
df['mul'] = df['value'] * df['weight']
Then write the function you would like to apply:然后编写您要应用的函数:
from pandas.core.window.rolling import _Rolling_and_Expanding
def weighted_average(x):
d = []
d.append(x['mul'].sum()/x['weight'].sum())
return pd.Series(d, index=['wavg'])
_Rolling_and_Expanding.weighted_average = weighted_average
Apply the function by the following line:通过以下行应用该功能:
result = mean_per_group.rolling(window=7).weighted_average()
Then you can get the series you wanted by:然后你可以通过以下方式获得你想要的系列:
result['wavg']
This is how I implemented weighted mean.这就是我实现加权平均值的方式。 Would be nice if there was a pairwise_apply for this sort of thing.
如果这种事情有 pairwise_apply 就好了。
from pandas.core.window import _flex_binary_moment, _Rolling_and_Expanding
def weighted_mean(self, weights, **kwargs):
weights = self._shallow_copy(weights)
window = self._get_window(weights)
def _get_weighted_mean(X, Y):
X = X.astype('float64')
Y = Y.astype('float64')
sum_f = lambda x: x.rolling(window, self.min_periods, center=self.center).sum(**kwargs)
return sum_f(X * Y) / sum_f(Y)
return _flex_binary_moment(self._selected_obj, weights._selected_obj,
_get_weighted_mean, pairwise=True)
_Rolling_and_Expanding.weighted_mean = weighted_mean
df['mean'] = df['vertical'].rolling(window = 7).weighted_mean(df['v_std'])
The following code should do (pardon my long naming conventions).以下代码应该可以(请原谅我的长命名约定)。 It is quite simple (just to take advantage of new version of Pandas's rolling.apply which added raw=False to allow passing more information than a 1d array):
这很简单(只是利用 Pandas 的 rolling.apply 的新版本,它添加了 raw=False 以允许传递比一维数组更多的信息):
def get_weighted_average(dataframe,window,columnname_data,columnname_weights):
processed_dataframe=dataframe.loc[:,(columnname_data,columnname_weights)].set_index(columnname_weights)
def get_mean_withweights(processed_dataframe_windowed):
return np.average(a=processed_dataframe_windowed,weights=processed_dataframe_windowed.index)
return processed_dataframe.rolling(window=window).apply(func=get_mean_withweights,raw=False)
Based on orherman answer I created the following class that should be easier to use and has a similar API to Dataframe.rolling() :根据 orherman 的回答,我创建了以下应该更易于使用并且具有与 Dataframe.rolling() 类似的 API 的类:
from pandas.core.window.rolling import RollingAndExpandingMixin
class RollingWeightedAverageDataFrame:
def __init__(self, df):
self.df = df
self.col_names = list(df.columns)
assert len(self.col_names) == 2,"Unexpected input, dataframe should have 2 columns"
def rolling(self, window, min_periods):
self.window = window
self.min_periods = min_periods
return self
def weighted_average(self):
self.df['mul'] = self.df[self.col_names[0]] * self.df[self.col_names[1]]
def _weighted_average(x):
return (x['mul'].sum() / x[self.col_names[1]].sum())
RollingAndExpandingMixin.weighted_average = _weighted_average
return self.df[[self.col_names[0], self.col_names[1], 'mul']].rolling(window=self.window, min_periods=self.min_periods).weighted_average()
Suppose in your code you have a dataframe with columns 'value' and 'weight', and you want a window of 7 and a minimum of 5 periods, just add the following:假设在您的代码中,您有一个包含“值”和“权重”列的数据框,并且您想要一个 7 个窗口和至少 5 个句点,只需添加以下内容:
df['wavg'] = RollingWeightedAverageDataFrame(df[['value','weight']])
.rolling(window=7, min_periods=5)
.weighted_average()
I believe you may be looking for win_type parameter of rolling().我相信你可能正在寻找 rolling() 的 win_type 参数。 You can specify different types of windows, like 'triang' (triangular) ...
您可以指定不同类型的窗口,例如“triang”(三角形)...
You may have a look at the parameter at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html您可以查看https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html上的参数
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.