在Python中使用标准偏差的熊猫移动平均值

Question

I want to smooth a noise using a moving average filter after fitting a regression model using a RandomForestRegressor for a data set I am considering using found in this link 我想在使用RandomForestRegressor拟合我正在考虑在此链接中找到的数据集的回归模型后，使用moving average filter来平滑噪声

import pandas as pd
import math
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer
from sklearn.model_selection import train_test_split
from math import sqrt
from sklearn.cross_validation import train_test_split


n_features=3000

df = pd.read_csv('cubic32.csv')

for i in range(1,n_features):
    df['X_t'+str(i)] = df['X'].shift(i)

print(df)

df.dropna(inplace=True)


X = df.drop('Y', axis=1)
y = df['Y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)

X_train = X_train.drop('time', axis=1)
X_test = X_test.drop('time', axis=1)


parameters = {'n_estimators': [10]}
clf_rf = RandomForestRegressor(random_state=1)
clf = GridSearchCV(clf_rf, parameters, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
model = clf.fit(X_train, y_train)
model.cv_results_['params'][model.best_index_]
math.sqrt(model.best_score_*-1)
model.grid_scores_

#####
print()
print(model.grid_scores_)
print("The best score: ",model.best_score_)

print("RMSE:",math.sqrt(model.best_score_*-1))

clf_rf.fit(X_train,y_train)
modelPrediction = clf_rf.predict(X_test)
print(modelPrediction)

print("Number of predictions:",len(modelPrediction))

meanSquaredError=mean_squared_error(y_test, modelPrediction)
print("Mean Square Error (MSE):", meanSquaredError)
rootMeanSquaredError = sqrt(meanSquaredError)
print("Root-Mean-Square Error (RMSE):", rootMeanSquaredError)

fig, ax = plt.subplots()
index_values=range(0,len(y_test))

y_test.sort_index(inplace=True)
X_test.sort_index(inplace=True)

modelPred_test = clf_rf.predict(X_test)
ax.plot(pd.Series(index_values), y_test.values)

smoothed=pd.rolling_mean(modelPred_test, 90, min_periods=90, freq=None, center=False, how=None)
PlotInOne=pd.DataFrame(pd.concat([pd.Series(smoothed), pd.Series(y_test.values)], axis=1))
plt.figure(); PlotInOne.plot(); plt.legend(loc='best')

However, the plot of the predicted values seems (as shown below) to be very coarse (the blue line). 但是，预测值的图看起来（如下图所示）非常粗糙（蓝线）。

The orange line is a plot of the actual value. 橙色线是实际值的曲线图。

How can we calculate the standard deviation of the prediction (blue line) in the plot shown above and pass it as an interval parameter to the moving average that the window runs on? 我们如何计算上图所示预测的标准偏差 （蓝线），并将其作为间隔参数传递给窗口运行的移动平均值？ Currently, I am setting the size of the moving window manually as 50 but I wanted to pass the value of the standard deviation instead. 目前，我正在手动将移动窗口的大小设置为50，但我想改为传递标准差的值。

smoothed=pd.rolling_mean(modelPred_test, 50, min_periods=50, freq=None, center=False, how=None)

Answer 1

smoothed=pd.rolling(modelPred_test, 50, min_periods=50, freq=None, center=False, how=None).std()

Answer 2

pd.rolling_mean(modelPred_test,windows=round(np.std(modelPred_test)))

您也可以将标准偏差插入最小窗口中〜

在Python中使用标准偏差的熊猫移动平均值

问题描述

2 个解决方案

解决方案1
0 2017-09-05 23:57:23

解决方案2
0 已采纳 2017-09-06 03:02:06

在Python中使用标准偏差的熊猫移动平均值

问题描述

2 个解决方案

解决方案1 0 2017-09-05 23:57:23

解决方案2 0 已采纳 2017-09-06 03:02:06

解决方案1
0 2017-09-05 23:57:23

解决方案2
0 已采纳 2017-09-06 03:02:06