简体   繁体   English

在Python中使用标准偏差的熊猫移动平均值

[英]Pandas moving average using a standard deviation in Python

I want to smooth a noise using a moving average filter after fitting a regression model using a RandomForestRegressor for a data set I am considering using found in this link 我想在使用RandomForestRegressor拟合我正在考虑在此链接中找到的数据集的回归模型后,使用moving average filter来平滑噪声

import pandas as pd
import math
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer
from sklearn.model_selection import train_test_split
from math import sqrt
from sklearn.cross_validation import train_test_split


n_features=3000

df = pd.read_csv('cubic32.csv')

for i in range(1,n_features):
    df['X_t'+str(i)] = df['X'].shift(i)

print(df)

df.dropna(inplace=True)


X = df.drop('Y', axis=1)
y = df['Y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)

X_train = X_train.drop('time', axis=1)
X_test = X_test.drop('time', axis=1)


parameters = {'n_estimators': [10]}
clf_rf = RandomForestRegressor(random_state=1)
clf = GridSearchCV(clf_rf, parameters, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
model = clf.fit(X_train, y_train)
model.cv_results_['params'][model.best_index_]
math.sqrt(model.best_score_*-1)
model.grid_scores_

#####
print()
print(model.grid_scores_)
print("The best score: ",model.best_score_)

print("RMSE:",math.sqrt(model.best_score_*-1))

clf_rf.fit(X_train,y_train)
modelPrediction = clf_rf.predict(X_test)
print(modelPrediction)

print("Number of predictions:",len(modelPrediction))

meanSquaredError=mean_squared_error(y_test, modelPrediction)
print("Mean Square Error (MSE):", meanSquaredError)
rootMeanSquaredError = sqrt(meanSquaredError)
print("Root-Mean-Square Error (RMSE):", rootMeanSquaredError)

fig, ax = plt.subplots()
index_values=range(0,len(y_test))

y_test.sort_index(inplace=True)
X_test.sort_index(inplace=True)

modelPred_test = clf_rf.predict(X_test)
ax.plot(pd.Series(index_values), y_test.values)

smoothed=pd.rolling_mean(modelPred_test, 90, min_periods=90, freq=None, center=False, how=None)
PlotInOne=pd.DataFrame(pd.concat([pd.Series(smoothed), pd.Series(y_test.values)], axis=1))
plt.figure(); PlotInOne.plot(); plt.legend(loc='best')

However, the plot of the predicted values seems (as shown below) to be very coarse (the blue line). 但是,预测值的图看起来(如下图所示)非常粗糙(蓝线)。

The orange line is a plot of the actual value. 橙色线是实际值的曲线图。

在此处输入图片说明

How can we calculate the standard deviation of the prediction (blue line) in the plot shown above and pass it as an interval parameter to the moving average that the window runs on? 我们如何计算上图所示预测的标准偏差 (蓝线),并将其作为间隔参数传递给窗口运行的移动平均值? Currently, I am setting the size of the moving window manually as 50 but I wanted to pass the value of the standard deviation instead. 目前,我正在手动将移动窗口的大小设置为50,但我想改为传递标准差的值。

smoothed=pd.rolling_mean(modelPred_test, 50, min_periods=50, freq=None, center=False, how=None)
smoothed=pd.rolling(modelPred_test, 50, min_periods=50, freq=None, center=False, how=None).std()
pd.rolling_mean(modelPred_test,windows=round(np.std(modelPred_test)))

您也可以将标准偏差插入最小窗口中〜

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 试图获得平均值、基数和标准差(Pandas) - Trying to get the average, cardinality and the standard deviation (Pandas) Pandas:按时钟时间计算平均值和标准偏差 - Pandas: compute average and standard deviation by clock time 在不使用内置函数的情况下移动 Python 中的标准偏差 - Moving Standard Deviation in Python WITHOUT using built-in functions Python 如何使平均函数适应标准差? - Python How to adapt an average function to standard deviation? 使用标准偏差python熊猫删除常量特征的问题 - Problems with removing constant features using standard deviation python pandas Python:多个文件中特定列的平均值和标准偏差,并使用标准偏差栏绘制平均值 - Python: average and standard deviation of specific columns among multiple files and plot the average with standard deviation bar pandas:计算列唯一值的平均时间和标准差 - pandas: calculate the average time and standard deviation of unique values of column 移动标准偏差Gnuplot - Moving Standard Deviation Gnuplot 有没有办法在 python pandas 中矢量化投资组合标准差 - Is there a way to vectorize the portfolio standard deviation in python pandas Binning Pandas列值是否以标准偏差为中心? - Binning Pandas column values by standard deviation centered on average?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM