简体   繁体   English

如何使用 scikit-learn 的 LinearRegression() 捕获时间序列数据的趋势以进行预测

[英]How to capture trend in time-series data for forecasting using scikit-learn's LinearRegression()

I have read some literature about time series forecasting with ML.我读过一些关于使用 ML 进行时间序列预测的文献。 I get the concepts of我得到的概念

  1. trend趋势
  2. seasonality季节性
  3. cyclic循环的
  4. noise噪音

I would like to use scikit-learn's LinearRegression() as a start to make predictions.我想使用 scikit-learn 的 LinearRegression() 作为开始进行预测。 If I get it right, I can capture seasonality and cyclic with some feature engineering like day_of_week , month or seasons .如果我做对了,我可以通过一些特征工程来捕捉季节性和周期性,比如day_of_weekmonthseasons I don't get it though, how to capture trend in the data.我不明白,如何捕捉数据中的趋势。 Is it lag features or a column calculating differences instead of totals?它是滞后特征还是计算差异而不是总数的列?

Linear regression fits the data into a linear model basically a function Y = W*X with coefficients w = (w1, …, wp) with minimized residual sum of squares between the true values and its corresponding predicted values.线性回归将数据拟合成线性 model,基本上是 function Y = W*X ,系数w = (w1, …, wp) ,真实值与其对应预测值之间的残差平方和最小。

Obviously, time-series data, by nature, is not linear.显然,时间序列数据本质上不是线性的。 In order to capture seasonality and cyclic patterns, I would suggest you to use polynomial function, at least with the power of n > 2 .为了捕捉季节性和循环模式,我建议您使用多项式 function,至少具有n > 2的幂。 You can use more advance regression models such as support vector and random forest models.您可以使用更高级的回归模型,例如支持向量和随机森林模型。

But for sure, you can start from linear model.但可以肯定的是,您可以从线性 model 开始。 Then later, you can easily shift to other advance models after realizing the limitations of linear models.然后,在意识到线性模型的局限性之后,您可以轻松地转向其他高级模型。

Check out sktime + sklearn to perform forecasting: You would be able to perform most of time-series analysis with them.查看sktime + sklearn执行预测:您将能够使用它们执行大部分时间序列分析。 Example,from my gist, show how you can assemble models two models to predict trends示例,根据我的要点,展示如何将模型组合成两个模型来预测趋势

from pytrends.request import TrendReq
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import EnsembleForecaster, ReducedForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
from sktime.utils.plotting import plot_series


# fetch cyberbullying data from Google trends
pytrend = TrendReq(hl="en-US")
pytrend.build_payload(
    kw_list=[
        "cyberbullying",
    ]
)
cyberbullying_df = pytrend.interest_over_time()

# transfrom DataFrame to Uni-Series of period
fow = cyberbullying_df["cyberbullying"].to_period(freq="W")

y_train, y_test = temporal_train_test_split(fow, test_size=36)
fh = ForecastingHorizon(y_test.index, is_relative=False)

# forecaster ensemble of knn and gradient boosting regressor
forecaster = EnsembleForecaster(
    [
        (
            "knn",
            ReducedForecaster(
                regressor=KNeighborsRegressor(n_neighbors=1),
                window_length=52,
                strategy="recursive",
                scitype="regressor",
            ),
        ),
        (
            "gboost",
            ReducedForecaster(
                regressor=GradientBoostingRegressor(n_estimators=100, random_state=42),
                window_length=52,
                strategy="recursive",
                scitype="regressor",
            ),
        ),
    ]
)

# train an ensemble forecasters and predict|forecast
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)

sktimes allows you to also use Facebook's prophet . sktimes允许您也使用 Facebook 的prophet Give it a go as it's my tool for doing time-series analysis: sktime给它一个 go 因为它是我进行时间序列分析的工具: sktime

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在scikit-learn中自动发现滞后的时序数据并使用时序数据进行分类 - How to auto-discover a lagging of time-series data in scikit-learn and classify using time-series data 用scikit学习时间序列预测 - Time series forecasting with scikit learn 如何在 scikit-learn 中预测时间序列? - How to predict time series in scikit-learn? 在scikit中使用svr进行时间序列预测 - Time series forecasting with svr in scikit learn scikit-learn交叉验证时间序列数据的自定义拆分 - scikit-learn cross validation custom splits for time series data 使用Scikit学习管道,当要素依赖于其他行时,如何从时间序列数据中生成要素? - Using Scikit-learn Pipelines, how can features be generated from time series data when the features depend on other rows? 如何使用 Scikit-learn Standard Scaler 对时间序列数据进行标准化? - How to do standardization on time series data with Scikit-learn Standard Scaler? scikit-learn - LinearRegression() 可以使用一个特征学习与直线不同的东西吗? - scikit-learn - can a LinearRegression() learn something different to a straight line using one feature? 使用 scikit learn 训练机器学习模型以进行时间序列预测 - Train machine learning model with scikit learn for time-series prediction 如何使用scikit-learn将多项式曲线拟合到数据? - How to fit a polynomial curve to data using scikit-learn?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM