简体   繁体   English

我已经有训练和测试数据集,如何将它们传递给 model

[英]I already have train and test datasets, how do i pass them to model

Usually we have one datset and we perform train and test split, but now i already have two datasets ie train data set and test data set.通常我们有一个数据集,我们执行训练和测试拆分,但现在我已经有两个数据集,即训练数据集和测试数据集。 How do i pass them to the model??我如何将它们传递给 model?

I am assuming your train dataset is the one with the labels and your test dataset is the one that is close to the real world data that you need to predict on.我假设您的训练数据集是带有标签的数据集,而您的测试数据集是与您需要预测的真实世界数据接近的数据集。 So you need to use your train data like you would typically, perform EDA etc. You can still split the train data into a 80-20 split or similar and validate the model.因此,您需要像通常那样使用您的火车数据,执行 EDA 等。您仍然可以将火车数据拆分为 80-20 拆分或类似拆分,并验证 model。

Once the model is trained you can predict on the test.训练 model 后,您可以在测试中进行预测。 Since your test may not have labels, you will not get any metrics.由于您的测试可能没有标签,因此您不会获得任何指标。 All evaluation is done on the validation set.所有评估都是在验证集上完成的。

 X_train, X_val, y_train, y_val = train_test_split(X,y,test_size = 0.2)

   model = RanndomForestClassifier() # instantiate model
   model.fit(X_train, y_train) # fit on the train data
   model.predict(X_val)  # predict on the validation set to measure performance 
   model.predict(test) # predict on the test set

I'm going to assume that you're using keras for this and have already made your model.我将假设您为此使用 keras 并且已经制作了 model。

Since you've already split your datasets, you can just go ahead and train your model on the training sets like this:由于您已经拆分了数据集,因此您可以提前 go 并在训练集上训练您的 model,如下所示:

model.fit(x_train, y_train, batch_size = 64, epochs = 10)

Then once you want to use your training set, just run:然后,一旦你想使用你的训练集,只需运行:

model.evaluate(x_test, y_test, batch_size = 128)

If you aren't using keras then let me know and we can work from there.如果您不使用 keras,请告诉我,我们可以从那里开始工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 tensorflow 数据集训练神经网络? - How do I train a neural network with tensorflow-datasets? 我是否必须分别为训练和测试数据做拟合 PCA - Do I have to do fit PCA separately for train and test data 如何在 Pytorch 中训练测试拆分 - How do I train test split in Pytorch 我有2个数据集xy和xi,我想将它们组合成一个数据集,我该怎么做 - i have 2 datasets xy and xi and i want to combine them to make one data set how can i do it 当我有训练集、开发集和测试集时,我是否在 X 或 X_train 上安装了缩放器? - Do I fit the scaler on X or X_train when I have train,dev and test sets? 如何在训练和测试不同数据集的情况下进行GridSearchCV? - How to do GridSearchCV with train and test being different datasets? 如何从数据集中拆分训练、测试、有效数据并将其存储在 pickle 中 - How can i split the train, test, valid data from datasets and store it in pickle 要使用OULAD数据集,如何加入或合并它们? - To use OULAD datasets how do I join or merge them? 如何腌制我的神经网络预测模型,这样我就不必每次都重新训练它们? - How do I pickle my neural net prediction models, so that i don't have to re-train them everytime? 我如何测试python私有方法(是的,我确实有理由对其进行测试) - How can I test a python private method (yes I do have reason to test them)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM