有条件地将数据拆分为训练和测试（Pandas）

Question

I have a code using Python to do a prediction task.我有一个使用 Python 执行预测任务的代码。 The task is to predict the sales for a company across different years from 2015 to 2019.任务是预测一家公司从 2015 年到 2019 年不同年份的销售额。

I want to split the data into training set and testing set.我想将数据拆分为训练集和测试集。

But the question is, I want to train the model using the data from 2015 to 2018, and test the model on the data on 2019.但问题是，我想用2015年到2018年的数据训练模型，在2019年的数据上测试模型。

How can I do that conditional splitting the data using train_test_split, ShuffleSplit,我怎样才能使用 train_test_split、ShuffleSplit、有条件地拆分数据？

X_train = df.iloc[train_index]
X_test = df.iloc[test_index]
y_train = X_train.Sales
y_test = X_test.Sales

Answer 1

Since you've got a condition at the very beginning, you lose the benefits of using shuffling methods used in machine learning preprocessing.由于您在一开始就遇到了条件，因此您将失去使用机器学习预处理中使用的改组方法的好处。 Therefore I would recommend not performing train-test split with such condition (I assume biased results).因此，我建议不要在这种情况下执行训练测试拆分（我假设结果有偏差）。 Nevertheless if you need to do it then try:不过，如果您需要这样做，请尝试：

train = your_data[your_data['year_column'] < 2019]
test = your_data[your_data['year_column'] == 2019]

X_train = train.loc[:, train.columns != 'column_of_interest']
y_train = train['column_of_interest']
X_test = test.loc[:, test.columns != 'column_of_interest']
y_train = test['column_of_interest']

有条件地将数据拆分为训练和测试（Pandas）

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-12-19 23:02:17

有条件地将数据拆分为训练和测试（Pandas）

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-12-19 23:02:17

解决方案1
0 已采纳 2019-12-19 23:02:17