[英]How to reshape a 3D tensor created for multistep and multivariate time series forecasting for Keras to be used in XGBoost?
I developed a model in Keras that relates an input matrix x (168x326) to an output vector y (168x1).我在 Keras 中开发了一个 model,它将输入矩阵 x (168x326) 与 output 向量 y (168x1) 相关联。 Input X is a week i containing 326 features generated hourly for 168 hours.
输入 X 是第i周,包含 168 小时每小时生成的 326 个特征。 Output y is week i+1 containing 168 hourly prices.
Output y 是第i+1周,包含 168 小时价格。 The training set contains 208 pairs of weeks (x_train->y_train), while the test set contains 51 pairs (x_test->y_test).
训练集包含 208 对周 (x_train->y_train),而测试集包含 51 对 (x_test->y_test)。 Shapes are 3D tensors and are formatted as follows:
形状是 3D 张量,格式如下:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
*Output: *输出:
x_train: (208, 168, 326) x_train:(208、168、326)
y_train: (208, 168, 1) y_train: (208, 168, 1)
x_test: (51, 168, 326) x_test: (51, 168, 326)
y_test: (51, 168, 1)* y_test: (51, 168, 1)*
I want to use these exact same datasets to perform price prediction using XGBoost.我想使用这些完全相同的数据集来使用 XGBoost 执行价格预测。 My model is built like this:
我的 model 是这样构建的:
reg = xgb.XGBRegressor(n_estimators=1000)
reg.fit(x_train, y_train,
eval_set=[(x_train, y_train), (x_test, y_test)],
early_stopping_rounds=50,
verbose=True)
However, when running, I get an error message saying that XGBoost expects 2D vectors.但是,在运行时,我收到一条错误消息,指出 XGBoost 需要 2D 向量。 The one that follows:
紧随其后的是:
ValueError: Please reshape the input data into 2-dimensional matrix.
I've done some tests removing or reshaping dimensions in the datasets, but I haven't succeeded.我已经完成了一些删除或重塑数据集中维度的测试,但我没有成功。 Could someone tell me how to perform this conversion on the data?
有人能告诉我如何对数据执行这种转换吗? Thanks.
谢谢。
First I needed to flatten the last two dimensions to create just one.首先,我需要展平最后两个维度来创建一个。 My tensors now have the following shapes: x_train: (208, 54768), y_train: (208, 168), x_test: (51, 54768) and y_test: (51, 168).
我的张量现在具有以下形状:x_train: (208, 54768), y_train: (208, 168), x_test: (51, 54768) 和 y_test: (51, 168)。 Thus reducing the tensor from 3D to 2D.
从而将张量从 3D 减少到 2D。 Next, I discovered that these regressors do not work by default with multi-valued outputs.
接下来,我发现这些回归器在默认情况下不适用于多值输出。 To do this it is necessary to import the MultiOutputRegressor.
为此,需要导入 MultiOutputRegressor。
from sklearn.multioutput import MultiOutputRegressor
Then you need to include the regressor inside this wrapper, like this:然后你需要在这个包装器中包含回归器,如下所示:
reg = MultiOutputRegressor(XGBRegressor())
I tested it for XGB and LGBM and it worked great.我对 XGB 和 LGBM 进行了测试,效果很好。 However, if you're using CatBoost, better format your data to use the CatBoost library's own Pool.
但是,如果您使用的是 CatBoost,最好格式化您的数据以使用 CatBoost 库自己的池。 Here:
这里:
from catboost import Pool
The code looks like this:代码如下所示:
dtrain = Pool(x_train, label=y_train)
params = {'iterations': 500, 'learning_rate': 0.1, 'depth': 3, 'loss_function': 'MultiRMSE'}
CAT_reg = CatBoostRegressor(**params)
CAT_reg.fit(dtrain)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.