简体   繁体   中英

Python (sklearn) train_test_split: choosing which data to train and which data to test

I want to use sklearn's train_test_split to manually split data into train and test categories. Specifically, in my .csv file, I want to use all the rows of data until the last row to train, and the last row to test.

The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live.

My code so far looks like this:

 df=pd.read_csv('Book5.csv', names=['Amiability', 'Email']) from sklearn.model_selection import train_test_split df_x = df['Amiability'] df_y = df['Email'] x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4)

Then,

 len(df)

Produces

331

I want to train with rows 0-330, and test with row 331. How can I do this?

If you don't absolutely need the test row to be the last row you should be able to do:

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=1, random_state=4)

When test_size= is an integer it specifies the absolute number of sample rows for the test set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM