简体   繁体   English

如何在不使用train_test_split()的情况下拆分数据集?

[英]How to split the data set without train_test_split()?

I need to split my dataset into training and testing. 我需要将数据集分为训练和测试。 I need the last 20% of the values for testing and the first 80% for training. 我需要测试值的最后20%和培训的前80%。 I have currently used the 'train_test_split()' but it picks the data randomly instead of the last 20%. 我目前使用了'train_test_split()',但是它随机选择数据,而不是最后20%。 How can I get the last 20% for testing and the first 80% for training? 我如何才能获得最后20%的测试和最初的80%的培训? My code is as follows: 我的代码如下:

numpy_array = df.as_matrix()
X = numpy_array[:, 1:26]
y = numpy_array[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20) #I do not want the data to be random.

Thanks 谢谢

train_pct_index = int(0.8 * len(X))
X_train, X_test = X[:train_pct_index], X[train_pct_index:]
y_train, y_test = y[:train_pct_index], y[train_pct_index:]

It's one of those situations where it's just better not to involve sklearn helpers. 这是最好不要让sklearn帮助者参与的情况sklearn Very straightforward, readable, and not dependent on knowing internal options of sklearn helpers, which code readers may not have experience with. 非常简单,易读,并且不依赖于已知的sklearn帮助器的内部选项,而代码阅读器可能没有经验。

I think this Stackoverflow topic answers your question : 我认为这个Stackoverflow主题回答了您的问题:

How to get a non-shuffled train_test_split in sklearn 如何在sklearn中获得未改组的train_test_split

And especially this piece of text : 特别是这段文字:

in scikit-learn version 0.19, you can pass the parameter shuffle=False to train_test_split to obtain a non-shuffled split. 在scikit-learn版本0.19中,您可以将参数shuffle = False传递给train_test_split以获得非改组的拆分。

From the documentation : 从文档中:

shuffle : boolean, optional (default=True) shuffle:布尔值,可选(默认= True)

Whether or not to shuffle the data before splitting. 拆分前是否对数据进行混洗。 If shuffle=False then >stratify must be None. 如果shuffle = False,则> stratify必须为None。

Please tell me if I didn't understand your question correctly 如果我不能正确理解您的问题,请告诉我

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用train_test_split将未标记的数据拆分为训练集和测试集? - How to split unlabeled data into train and test set using train_test_split? 如何在不使用 function train_test_split 的情况下将数据拆分为测试和训练? - How can I split the data into test and train without using function train_test_split? 如何使用 Python Numpy 中的 train_test_split 将数据拆分为训练、测试和验证数据集? 分裂不应该是随机的 - How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random 带有test_size = 0的train_test_split如何影响数据? - How is train_test_split with test_size=0 affecting the data? train_test_split 不拆分数据 - train_test_split not splitting data Python,train_test_split 是如何工作的? - Python, how train_test_split works? Python Sklearn train_test_split():如何设置要训练的数据? - Python Sklearn train_test_split(): how to set Which Data is Taken for Training? 如何使用 sklearn 中的 train_test_split 确保用户和项目同时出现在训练和测试数据集中? - How can I ensure that the users and items appear in both train and test data set with train_test_split in sklearn? python - 如何在没有train_test_split函数的情况下将数据中的固定行数拆分为Xtest、Xtrain、Ytrain和Ytest - How to split the fixed number of rows in a data into Xtest, Xtrain , Ytrain and Ytest without train_test_split function in python train_test_split:值错误 - train_test_split: ValueError
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM