K-fold cross validation to reduce overfitting : problem with the implementation

Question

It is the first time I am trying to use cross-validation and I am facing an error.

Firstly my dataset looks like this :

So, in order to avoid/reduce the overfitting of my model I am trying to use a k-fold cross validation.

from sklearn.model_selection import KFold 
X,y = creation_X_y() #Function which is cleaning my data
kf = KFold(n_splits=5) 

for train_index, test_index in kf.split(X):
    print("Train:", train_index, "Validation:",test_index)
    X_train = X[train_index]
    X_test = X[test_index]
    y_train, y_test = y[train_index], y[test_index]

However, I am facing the following error and I am not finding how I could solve it. I am understanding that it looks for these values in the columns but it should probably look in the index no ? May I use X.loc[train_index] for example ?

Thanks in advance for your time and your help !

Answer 1

Your assumption is correct: .iloc[index] will work. Here is the code:

from sklearn.model_selection import KFold 
X,y = creation_X_y() #Function which is cleaning my data
kf = KFold(n_splits=5) 

for train_index, test_index in kf.split(X):
    print("Train:", train_index, "Validation:",test_index)
    X_train = X.iloc[train_index]
    X_test = X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

Another way is to make creation_X_y() return a numpy.array .

K-fold cross validation to reduce overfitting : problem with the implementation

Question

1 answers

solution1
1 2022-06-09 12:32:29

K-fold cross validation to reduce overfitting : problem with the implementation

Question

1 answers

solution1 1 2022-06-09 12:32:29

solution1
1 2022-06-09 12:32:29