简体   繁体   中英

K-fold cross validation to reduce overfitting : problem with the implementation

It is the first time I am trying to use cross-validation and I am facing an error.

Firstly my dataset looks like this :

在此处输入图像描述

So, in order to avoid/reduce the overfitting of my model I am trying to use a k-fold cross validation.

from sklearn.model_selection import KFold 
X,y = creation_X_y() #Function which is cleaning my data
kf = KFold(n_splits=5) 

for train_index, test_index in kf.split(X):
    print("Train:", train_index, "Validation:",test_index)
    X_train = X[train_index]
    X_test = X[test_index]
    y_train, y_test = y[train_index], y[test_index]

However, I am facing the following error and I am not finding how I could solve it. I am understanding that it looks for these values in the columns but it should probably look in the index no ? May I use X.loc[train_index] for example ?

Thanks in advance for your time and your help !

在此处输入图像描述

Your assumption is correct: .iloc[index] will work. Here is the code:

from sklearn.model_selection import KFold 
X,y = creation_X_y() #Function which is cleaning my data
kf = KFold(n_splits=5) 

for train_index, test_index in kf.split(X):
    print("Train:", train_index, "Validation:",test_index)
    X_train = X.iloc[train_index]
    X_test = X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

Another way is to make creation_X_y() return a numpy.array .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM