[英]How to split the data set without train_test_split()?
I need to split my dataset into training and testing. 我需要将数据集分为训练和测试。 I need the last 20% of the values for testing and the first 80% for training.
我需要测试值的最后20%和培训的前80%。 I have currently used the 'train_test_split()' but it picks the data randomly instead of the last 20%.
我目前使用了'train_test_split()',但是它随机选择数据,而不是最后20%。 How can I get the last 20% for testing and the first 80% for training?
我如何才能获得最后20%的测试和最初的80%的培训? My code is as follows:
我的代码如下:
numpy_array = df.as_matrix()
X = numpy_array[:, 1:26]
y = numpy_array[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20) #I do not want the data to be random.
Thanks 谢谢
train_pct_index = int(0.8 * len(X))
X_train, X_test = X[:train_pct_index], X[train_pct_index:]
y_train, y_test = y[:train_pct_index], y[train_pct_index:]
It's one of those situations where it's just better not to involve sklearn
helpers. 这是最好不要让
sklearn
帮助者参与的情况sklearn
。 Very straightforward, readable, and not dependent on knowing internal options of sklearn
helpers, which code readers may not have experience with. 非常简单,易读,并且不依赖于已知的
sklearn
帮助器的内部选项,而代码阅读器可能没有经验。
I think this Stackoverflow topic answers your question : 我认为这个Stackoverflow主题回答了您的问题:
How to get a non-shuffled train_test_split in sklearn 如何在sklearn中获得未改组的train_test_split
And especially this piece of text : 特别是这段文字:
in scikit-learn version 0.19, you can pass the parameter shuffle=False to train_test_split to obtain a non-shuffled split.
在scikit-learn版本0.19中,您可以将参数shuffle = False传递给train_test_split以获得非改组的拆分。
From the documentation : 从文档中:
shuffle : boolean, optional (default=True)
shuffle:布尔值,可选(默认= True)
Whether or not to shuffle the data before splitting.
拆分前是否对数据进行混洗。 If shuffle=False then >stratify must be None.
如果shuffle = False,则> stratify必须为None。
Please tell me if I didn't understand your question correctly 如果我不能正确理解您的问题,请告诉我
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.