简体   繁体   中英

Randomly define the training size in train_test_split sklearn

I am trying to split data that I have into 40% training and 60% validation, then I want to repeat this 30 times, each time with random training and different validation. How can I do this? (not using Kfold )

This is what I wrote but I am getting the same results every time for accuracy, I do not know how to do this with different training and validation each time. My accuracy is the same for each iteration, I don't know why.

for i in range (30):
      X_train, X_test, y_train, y_test =train_test_split(df,y, 
      train_size=0.4, shuffle=True)
      metrics.accuracy_score(linsvc.predict(X_train), R_train)

To achieve a random training size for each of the 30 iterations you can use a random generator and then use this as the portion of the training set size.


Use this:

from sklearn.model_selection import train_test_split
import random
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
y = np.array([1, 2, 1, 2, 1, 2])

for i in range(30):
    # the training size will vary between 0.2 and 0.5 randomly
    random_portion = round(random.uniform(0.2, 0.5) , 3)
    X_train, X_test, y_train, y_test =train_test_split(X,y, train_size= random_portion, shuffle=True)

You can modify the code accordingly.


EDIT 1

You can do the same using only numpy as you wish.

from sklearn.model_selection import train_test_split
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
y = np.array([1, 2, 1, 2, 1, 2])

for i in range(30):
    random_portion = round(np.random.rand(),3)
    X_train, X_test, y_train, y_test =train_test_split(X,y, train_size= random_portion, shuffle=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM