使用 RandomForest 进行 K 折交叉验证

Question

我目前正在尝试使用 RandomForest 来预测某些内容，同时还使用 k 折交叉验证来最小化我对 min_samples_leaf 的交叉验证错误。 我目前在设置代码时遇到问题，因为当我到达train_x = x[train_index]时我一直遇到错误。 我得到的错误如下所示。

from sklearn import model_selection
kf = model_selection.KFold(n_splits=5)

x = train
y = test

for m in range(0, 10): # vary min_samples_leaf

    dtr = ensemble.RandomForestRegressor(n_estimators = 15, min_samples_leaf = m, max_features = 10, criterion = 'mse')

    for train_index, test_index in kf.split(x):
        print("TRAIN:", train_index, "TEST:", test_index)
        train_x = x[train_index]
        train_y = y[test_index]
        regr = dtr.fit(train_x, train_y)

键错误：

None of [Int64Index([15546, 15547, 15548, 15549, 15550, 15551, 15552, 15553, 15554,\n            15555,\n            ...\n            77718, 77719, 77720, 77721, 77722, 77723, 77724, 77725, 77726,\n            77727],\n           dtype='int64', length=62182)] are in the [columns]

Answer 1

你有一大堆从 kf.split() 中提取的值，你必须调用 x[train_index] 的 train_index 不只是在数组 x 中。

代码看起来是对的，所以我怀疑“train”（当然还有“x”）中的数据格式有问题？

错误说您的 Int64Index 类型（索引 IIRC 的 pandas 类型）的值大于 x 的值（最大长度 62182），因此您的原始数据肯定有问题。

使用 RandomForest 进行 K 折交叉验证

问题描述

1 个解决方案

解决方案1
0 2019-10-09 01:30:35

使用 RandomForest 进行 K 折交叉验证

问题描述

1 个解决方案

解决方案1 0 2019-10-09 01:30:35

解决方案1
0 2019-10-09 01:30:35