简体   繁体   中英

K-fold Cross Validation with RandomForest

I'm currently trying to use RandomForest to predict something while also using k-fold cross validation to minimize my cross-validation error for min_samples_leaf. I'm currently having trouble setting up my code because I keep running into error when I get to train_x = x[train_index] . The error I get is displayed below.

from sklearn import model_selection
kf = model_selection.KFold(n_splits=5)

x = train
y = test

for m in range(0, 10): # vary min_samples_leaf

    dtr = ensemble.RandomForestRegressor(n_estimators = 15, min_samples_leaf = m, max_features = 10, criterion = 'mse')

    for train_index, test_index in kf.split(x):
        print("TRAIN:", train_index, "TEST:", test_index)
        train_x = x[train_index]
        train_y = y[test_index]
        regr = dtr.fit(train_x, train_y)

KeyError:

None of [Int64Index([15546, 15547, 15548, 15549, 15550, 15551, 15552, 15553, 15554,\n            15555,\n            ...\n            77718, 77719, 77720, 77721, 77722, 77723, 77724, 77725, 77726,\n            77727],\n           dtype='int64', length=62182)] are in the [columns]

You have whole bunch of values that's extracted from kf.split(), and train_index you got to call x[train_index] isn't just in the array x.

The code seems right, so I suspect that data format in "train" (and of course "x") is something wrong?

Error says you have Int64Index type (pandas type for index IIRC) with values that bigger than what x has( max length 62182), so there is definitely something went wrong in your original data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM