Python (sklearn) train_test_split: choosing which data to train and which data to test

Question

I want to use sklearn's train_test_split to manually split data into train and test categories. Specifically, in my .csv file, I want to use all the rows of data until the last row to train, and the last row to test.

The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live.

My code so far looks like this:

 df=pd.read_csv('Book5.csv', names=['Amiability', 'Email']) from sklearn.model_selection import train_test_split df_x = df['Amiability'] df_y = df['Email'] x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4)

Then,

 len(df)

Produces

I want to train with rows 0-330, and test with row 331. How can I do this?

Answer 1

If you don't absolutely need the test row to be the last row you should be able to do:

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=1, random_state=4)

When test_size= is an integer it specifies the absolute number of sample rows for the test set.

Python (sklearn) train_test_split: choosing which data to train and which data to test

Question

1 answers

solution1
0 2022-06-28 23:42:46

Python (sklearn) train_test_split: choosing which data to train and which data to test

Question

1 answers

solution1 0 2022-06-28 23:42:46

solution1
0 2022-06-28 23:42:46