简体   繁体   中英

How to split unlabeled data into train and test set using train_test_split?

I am new in data sicence und actually try to build my first model. I am confuse about the correct way to use the split function. Most of documentations recommend the following approach (where X=data und Y= label):

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

I have a dataset without label (X=data), and want to build a model based on it to predict anomalies. That means, I can actually split my dataset only in 2 (portion: X_train and X_test). But I am not sure if this is the correct for my dataset and would like to know how should I proceed to get y. Thank you advance for your support

You can see the example in the link . The function can work on one variable also

train_test_split(y, shuffle=False)

In your case, the answer will be

X_train, X_test = train_test_split(X, test_size=0.2, random_state=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM