[英]I already have train and test datasets, how do i pass them to model
Usually we have one datset and we perform train and test split, but now i already have two datasets ie train data set and test data set.通常我们有一个数据集,我们执行训练和测试拆分,但现在我已经有两个数据集,即训练数据集和测试数据集。 How do i pass them to the model??我如何将它们传递给 model?
I am assuming your train dataset is the one with the labels and your test dataset is the one that is close to the real world data that you need to predict on.我假设您的训练数据集是带有标签的数据集,而您的测试数据集是与您需要预测的真实世界数据接近的数据集。 So you need to use your train data like you would typically, perform EDA etc. You can still split the train data into a 80-20 split or similar and validate the model.因此,您需要像通常那样使用您的火车数据,执行 EDA 等。您仍然可以将火车数据拆分为 80-20 拆分或类似拆分,并验证 model。
Once the model is trained you can predict on the test.训练 model 后,您可以在测试中进行预测。 Since your test may not have labels, you will not get any metrics.由于您的测试可能没有标签,因此您不会获得任何指标。 All evaluation is done on the validation set.所有评估都是在验证集上完成的。
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size = 0.2)
model = RanndomForestClassifier() # instantiate model
model.fit(X_train, y_train) # fit on the train data
model.predict(X_val) # predict on the validation set to measure performance
model.predict(test) # predict on the test set
I'm going to assume that you're using keras for this and have already made your model.我将假设您为此使用 keras 并且已经制作了 model。
Since you've already split your datasets, you can just go ahead and train your model on the training sets like this:由于您已经拆分了数据集,因此您可以提前 go 并在训练集上训练您的 model,如下所示:
model.fit(x_train, y_train, batch_size = 64, epochs = 10)
Then once you want to use your training set, just run:然后,一旦你想使用你的训练集,只需运行:
model.evaluate(x_test, y_test, batch_size = 128)
If you aren't using keras then let me know and we can work from there.如果您不使用 keras,请告诉我,我们可以从那里开始工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.