[英]How to capture trend in time-series data for forecasting using scikit-learn's LinearRegression()
I have read some literature about time series forecasting with ML.我读过一些关于使用 ML 进行时间序列预测的文献。 I get the concepts of
我得到的概念
I would like to use scikit-learn's LinearRegression() as a start to make predictions.我想使用 scikit-learn 的 LinearRegression() 作为开始进行预测。 If I get it right, I can capture seasonality and cyclic with some feature engineering like
day_of_week
, month
or seasons
.如果我做对了,我可以通过一些特征工程来捕捉季节性和周期性,比如
day_of_week
、 month
或seasons
。 I don't get it though, how to capture trend in the data.我不明白,如何捕捉数据中的趋势。 Is it lag features or a column calculating differences instead of totals?
它是滞后特征还是计算差异而不是总数的列?
Linear regression fits the data into a linear model basically a function Y = W*X
with coefficients w = (w1, …, wp)
with minimized residual sum of squares between the true values and its corresponding predicted values.线性回归将数据拟合成线性 model,基本上是 function
Y = W*X
,系数w = (w1, …, wp)
,真实值与其对应预测值之间的残差平方和最小。
Obviously, time-series data, by nature, is not linear.显然,时间序列数据本质上不是线性的。 In order to capture seasonality and cyclic patterns, I would suggest you to use polynomial function, at least with the power of
n > 2
.为了捕捉季节性和循环模式,我建议您使用多项式 function,至少具有
n > 2
的幂。 You can use more advance regression models such as support vector and random forest models.您可以使用更高级的回归模型,例如支持向量和随机森林模型。
But for sure, you can start from linear model.但可以肯定的是,您可以从线性 model 开始。 Then later, you can easily shift to other advance models after realizing the limitations of linear models.
然后,在意识到线性模型的局限性之后,您可以轻松地转向其他高级模型。
Check out sktime
+ sklearn
to perform forecasting: You would be able to perform most of time-series analysis with them.查看
sktime
+ sklearn
执行预测:您将能够使用它们执行大部分时间序列分析。 Example,from my gist, show how you can assemble models two models to predict trends示例,根据我的要点,展示如何将模型组合成两个模型来预测趋势
from pytrends.request import TrendReq
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import EnsembleForecaster, ReducedForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
from sktime.utils.plotting import plot_series
# fetch cyberbullying data from Google trends
pytrend = TrendReq(hl="en-US")
pytrend.build_payload(
kw_list=[
"cyberbullying",
]
)
cyberbullying_df = pytrend.interest_over_time()
# transfrom DataFrame to Uni-Series of period
fow = cyberbullying_df["cyberbullying"].to_period(freq="W")
y_train, y_test = temporal_train_test_split(fow, test_size=36)
fh = ForecastingHorizon(y_test.index, is_relative=False)
# forecaster ensemble of knn and gradient boosting regressor
forecaster = EnsembleForecaster(
[
(
"knn",
ReducedForecaster(
regressor=KNeighborsRegressor(n_neighbors=1),
window_length=52,
strategy="recursive",
scitype="regressor",
),
),
(
"gboost",
ReducedForecaster(
regressor=GradientBoostingRegressor(n_estimators=100, random_state=42),
window_length=52,
strategy="recursive",
scitype="regressor",
),
),
]
)
# train an ensemble forecasters and predict|forecast
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
sktimes
allows you to also use Facebook's prophet
. sktimes
允许您也使用 Facebook 的prophet
。 Give it a go as it's my tool for doing time-series analysis: sktime给它一个 go 因为它是我进行时间序列分析的工具: sktime
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.