如何在不使用train_test_split（）的情况下拆分数据集？

Question

I need to split my dataset into training and testing. 我需要将数据集分为训练和测试。 I need the last 20% of the values for testing and the first 80% for training. 我需要测试值的最后20％和培训的前80％。 I have currently used the 'train_test_split()' but it picks the data randomly instead of the last 20%. 我目前使用了'train_test_split（）'，但是它随机选择数据，而不是最后20％。 How can I get the last 20% for testing and the first 80% for training? 我如何才能获得最后20％的测试和最初的80％的培训？ My code is as follows: 我的代码如下：

numpy_array = df.as_matrix()
X = numpy_array[:, 1:26]
y = numpy_array[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20) #I do not want the data to be random.

Thanks 谢谢

Answer 1

train_pct_index = int(0.8 * len(X))
X_train, X_test = X[:train_pct_index], X[train_pct_index:]
y_train, y_test = y[:train_pct_index], y[train_pct_index:]

It's one of those situations where it's just better not to involve sklearn helpers. 这是最好不要让sklearn帮助者参与的情况sklearn 。 Very straightforward, readable, and not dependent on knowing internal options of sklearn helpers, which code readers may not have experience with. 非常简单，易读，并且不依赖于已知的sklearn帮助器的内部选项，而代码阅读器可能没有经验。

Answer 2

I think this Stackoverflow topic answers your question : 我认为这个Stackoverflow主题回答了您的问题：

How to get a non-shuffled train_test_split in sklearn 如何在sklearn中获得未改组的train_test_split

And especially this piece of text : 特别是这段文字：

in scikit-learn version 0.19, you can pass the parameter shuffle=False to train_test_split to obtain a non-shuffled split. 在scikit-learn版本0.19中，您可以将参数shuffle = False传递给train_test_split以获得非改组的拆分。

From the documentation : 从文档中：

shuffle : boolean, optional (default=True) shuffle：布尔值，可选（默认= True）

Whether or not to shuffle the data before splitting. 拆分前是否对数据进行混洗。 If shuffle=False then >stratify must be None. 如果shuffle = False，则> stratify必须为None。

Please tell me if I didn't understand your question correctly 如果我不能正确理解您的问题，请告诉我

如何在不使用train_test_split（）的情况下拆分数据集？

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-03-01 16:41:37

解决方案2
1 2018-03-01 16:42:07

如何在不使用train_test_split（）的情况下拆分数据集？

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-03-01 16:41:37

解决方案2 1 2018-03-01 16:42:07

解决方案1
1 已采纳 2018-03-01 16:41:37

解决方案2
1 2018-03-01 16:42:07