如何使用train_test_split将未标记的数据拆分为训练集和测试集？

Question

I am new in data sicence und actually try to build my first model.我是数据科学领域的新手，实际上尝试构建我的第一个模型。 I am confuse about the correct way to use the split function.我对使用 split 功能的正确方法感到困惑。 Most of documentations recommend the following approach (where X=data und Y= label):大多数文档推荐以下方法（其中 X=data 和 Y= 标签）：

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

I have a dataset without label (X=data), and want to build a model based on it to predict anomalies.我有一个没有标签的数据集 (X=data)，我想基于它构建一个模型来预测异常。 That means, I can actually split my dataset only in 2 (portion: X_train and X_test).这意味着，我实际上只能将数据集分成 2 个（部分：X_train 和 X_test）。 But I am not sure if this is the correct for my dataset and would like to know how should I proceed to get y.但我不确定这对我的数据集是否正确，并且想知道我应该如何继续获得 y。 Thank you advance for your support提前感谢您的支持

Answer 1

You can see the example in the link .您可以在链接中查看示例。 The function can work on one variable also该函数也可以作用于一个变量

train_test_split(y, shuffle=False) train_test_split(y, shuffle=False)

In your case, the answer will be在你的情况下，答案将是

X_train, X_test = train_test_split(X, test_size=0.2, random_state=1)

如何使用train_test_split将未标记的数据拆分为训练集和测试集？

问题描述

1 个解决方案

解决方案1
0 2020-12-18 01:20:01

如何使用train_test_split将未标记的数据拆分为训练集和测试集？

问题描述

1 个解决方案

解决方案1 0 2020-12-18 01:20:01

解决方案1
0 2020-12-18 01:20:01