[英]Pandas moving average using a standard deviation in Python
I want to smooth a noise using a moving average filter
after fitting a regression model using a RandomForestRegressor
for a data set I am considering using found in this link 我想在使用
RandomForestRegressor
拟合我正在考虑在此链接中找到的数据集的回归模型后,使用moving average filter
来平滑噪声
import pandas as pd
import math
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer
from sklearn.model_selection import train_test_split
from math import sqrt
from sklearn.cross_validation import train_test_split
n_features=3000
df = pd.read_csv('cubic32.csv')
for i in range(1,n_features):
df['X_t'+str(i)] = df['X'].shift(i)
print(df)
df.dropna(inplace=True)
X = df.drop('Y', axis=1)
y = df['Y']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)
X_train = X_train.drop('time', axis=1)
X_test = X_test.drop('time', axis=1)
parameters = {'n_estimators': [10]}
clf_rf = RandomForestRegressor(random_state=1)
clf = GridSearchCV(clf_rf, parameters, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
model = clf.fit(X_train, y_train)
model.cv_results_['params'][model.best_index_]
math.sqrt(model.best_score_*-1)
model.grid_scores_
#####
print()
print(model.grid_scores_)
print("The best score: ",model.best_score_)
print("RMSE:",math.sqrt(model.best_score_*-1))
clf_rf.fit(X_train,y_train)
modelPrediction = clf_rf.predict(X_test)
print(modelPrediction)
print("Number of predictions:",len(modelPrediction))
meanSquaredError=mean_squared_error(y_test, modelPrediction)
print("Mean Square Error (MSE):", meanSquaredError)
rootMeanSquaredError = sqrt(meanSquaredError)
print("Root-Mean-Square Error (RMSE):", rootMeanSquaredError)
fig, ax = plt.subplots()
index_values=range(0,len(y_test))
y_test.sort_index(inplace=True)
X_test.sort_index(inplace=True)
modelPred_test = clf_rf.predict(X_test)
ax.plot(pd.Series(index_values), y_test.values)
smoothed=pd.rolling_mean(modelPred_test, 90, min_periods=90, freq=None, center=False, how=None)
PlotInOne=pd.DataFrame(pd.concat([pd.Series(smoothed), pd.Series(y_test.values)], axis=1))
plt.figure(); PlotInOne.plot(); plt.legend(loc='best')
However, the plot of the predicted values seems (as shown below) to be very coarse (the blue line). 但是,预测值的图看起来(如下图所示)非常粗糙(蓝线)。
The orange line is a plot of the actual value. 橙色线是实际值的曲线图。
How can we calculate the standard deviation of the prediction (blue line) in the plot shown above and pass it as an interval parameter to the moving average that the window runs on? 我们如何计算上图所示预测的标准偏差 (蓝线),并将其作为间隔参数传递给窗口运行的移动平均值? Currently, I am setting the size of the moving window manually as 50 but I wanted to pass the value of the standard deviation instead.
目前,我正在手动将移动窗口的大小设置为50,但我想改为传递标准差的值。
smoothed=pd.rolling_mean(modelPred_test, 50, min_periods=50, freq=None, center=False, how=None)
smoothed=pd.rolling(modelPred_test, 50, min_periods=50, freq=None, center=False, how=None).std()
pd.rolling_mean(modelPred_test,windows=round(np.std(modelPred_test)))
您也可以将标准偏差插入最小窗口中〜
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.