简体   繁体   English

如何使用train_test_split将未标记的数据拆分为训练集和测试集?

[英]How to split unlabeled data into train and test set using train_test_split?

I am new in data sicence und actually try to build my first model.我是数据科学领域的新手,实际上尝试构建我的第一个模型。 I am confuse about the correct way to use the split function.我对使用 split 功能的正确方法感到困惑。 Most of documentations recommend the following approach (where X=data und Y= label):大多数文档推荐以下方法(其中 X=data 和 Y= 标签):

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

I have a dataset without label (X=data), and want to build a model based on it to predict anomalies.我有一个没有标签的数据集 (X=data),我想基于它构建一个模型来预测异常。 That means, I can actually split my dataset only in 2 (portion: X_train and X_test).这意味着,我实际上只能将数据集分成 2 个(部分:X_train 和 X_test)。 But I am not sure if this is the correct for my dataset and would like to know how should I proceed to get y.但我不确定这对我的数据集是否正确,并且想知道我应该如何继续获得 y。 Thank you advance for your support提前感谢您的支持

You can see the example in the link .您可以在链接中查看示例。 The function can work on one variable also该函数也可以作用于一个变量

train_test_split(y, shuffle=False) train_test_split(y, shuffle=False)

In your case, the answer will be在你的情况下,答案将是

X_train, X_test = train_test_split(X, test_size=0.2, random_state=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM