I want to use sklearn's train_test_split to manually split data into train and test categories. Specifically, in my .csv file, I want to use all the rows of data until the last row to train, and the last row to test.
The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live. My code so far looks like this: df=pd.read_csv('Book5.csv', names=['Amiability', 'Email']) from sklearn.model_selection import train_test_split df_x = df['Amiability'] df_y = df['Email'] x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4)
Then,
len(df)
Produces
331
I want to train with rows 0-330, and test with row 331. How can I do this?
If you don't absolutely need the test row to be the last row you should be able to do:
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=1, random_state=4)
When test_size=
is an integer it specifies the absolute number of sample rows for the test set.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.