I have a code using Python to do a prediction task. The task is to predict the sales for a company across different years from 2015 to 2019.
I want to split the data into training set and testing set.
But the question is, I want to train the model using the data from 2015 to 2018, and test the model on the data on 2019.
How can I do that conditional splitting the data using train_test_split, ShuffleSplit,
X_train = df.iloc[train_index]
X_test = df.iloc[test_index]
y_train = X_train.Sales
y_test = X_test.Sales
Since you've got a condition at the very beginning, you lose the benefits of using shuffling methods used in machine learning preprocessing. Therefore I would recommend not performing train-test split with such condition (I assume biased results). Nevertheless if you need to do it then try:
train = your_data[your_data['year_column'] < 2019]
test = your_data[your_data['year_column'] == 2019]
X_train = train.loc[:, train.columns != 'column_of_interest']
y_train = train['column_of_interest']
X_test = test.loc[:, test.columns != 'column_of_interest']
y_train = test['column_of_interest']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.