简体   繁体   English

如何重塑为 Keras 用于 XGBoost 的多步和多变量时间序列预测创建的 3D 张量?

[英]How to reshape a 3D tensor created for multistep and multivariate time series forecasting for Keras to be used in XGBoost?

I developed a model in Keras that relates an input matrix x (168x326) to an output vector y (168x1).我在 Keras 中开发了一个 model,它将输入矩阵 x (168x326) 与 output 向量 y (168x1) 相关联。 Input X is a week i containing 326 features generated hourly for 168 hours.输入 X 是第i周,包含 168 小时每小时生成的 326 个特征。 Output y is week i+1 containing 168 hourly prices. Output y 是第i+1周,包含 168 小时价格。 The training set contains 208 pairs of weeks (x_train->y_train), while the test set contains 51 pairs (x_test->y_test).训练集包含 208 对周 (x_train->y_train),而测试集包含 51 对 (x_test->y_test)。 Shapes are 3D tensors and are formatted as follows:形状是 3D 张量,格式如下:

print(x_train.shape)
print(y_train.shape)

print(x_test.shape)
print(y_test.shape)

*Output: *输出:

x_train: (208, 168, 326) x_train:(208、168、326)

y_train: (208, 168, 1) y_train: (208, 168, 1)

x_test: (51, 168, 326) x_test: (51, 168, 326)

y_test: (51, 168, 1)* y_test: (51, 168, 1)*

I want to use these exact same datasets to perform price prediction using XGBoost.我想使用这些完全相同的数据集来使用 XGBoost 执行价格预测。 My model is built like this:我的 model 是这样构建的:

reg = xgb.XGBRegressor(n_estimators=1000)
reg.fit(x_train, y_train,
        eval_set=[(x_train, y_train), (x_test, y_test)],
        early_stopping_rounds=50,
        verbose=True)

However, when running, I get an error message saying that XGBoost expects 2D vectors.但是,在运行时,我收到一条错误消息,指出 XGBoost 需要 2D 向量。 The one that follows:紧随其后的是:

ValueError: Please reshape the input data into 2-dimensional matrix.

I've done some tests removing or reshaping dimensions in the datasets, but I haven't succeeded.我已经完成了一些删除或重塑数据集中维度的测试,但我没有成功。 Could someone tell me how to perform this conversion on the data?有人能告诉我如何对数据执行这种转换吗? Thanks.谢谢。

First I needed to flatten the last two dimensions to create just one.首先,我需要展平最后两个维度来创建一个。 My tensors now have the following shapes: x_train: (208, 54768), y_train: (208, 168), x_test: (51, 54768) and y_test: (51, 168).我的张量现在具有以下形状:x_train: (208, 54768), y_train: (208, 168), x_test: (51, 54768) 和 y_test: (51, 168)。 Thus reducing the tensor from 3D to 2D.从而将张量从 3D 减少到 2D。 Next, I discovered that these regressors do not work by default with multi-valued outputs.接下来,我发现这些回归器在默认情况下不适用于多值输出。 To do this it is necessary to import the MultiOutputRegressor.为此,需要导入 MultiOutputRegressor。

from sklearn.multioutput import MultiOutputRegressor

Then you need to include the regressor inside this wrapper, like this:然后你需要在这个包装器中包含回归器,如下所示:

reg = MultiOutputRegressor(XGBRegressor())

I tested it for XGB and LGBM and it worked great.我对 XGB 和 LGBM 进行了测试,效果很好。 However, if you're using CatBoost, better format your data to use the CatBoost library's own Pool.但是,如果您使用的是 CatBoost,最好格式化您的数据以使用 CatBoost 库自己的池。 Here:这里:

from catboost import Pool

The code looks like this:代码如下所示:

dtrain = Pool(x_train, label=y_train)
params = {'iterations': 500, 'learning_rate': 0.1, 'depth': 3, 'loss_function': 'MultiRMSE'}
    
CAT_reg = CatBoostRegressor(**params)
CAT_reg.fit(dtrain)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM