简体   繁体   English

如何意外地训练测试拆分和交叉验证?

[英]How to train-test split and cross-validate in surprise?

I wrote the following code below which works:我在下面编写了以下代码,该代码有效:

from surprise.model_selection import cross_validate

cross_validate(algo,dataset,measures=['RMSE', 'MAE'],cv=5, verbose=False, n_jobs=-1)

However when I do this: (notice the trainset is passed here in cross_validate instead of whole dataset)但是,当我这样做时:(注意训练集是在 cross_validate 中传递的,而不是整个数据集)

from surprise.model_selection import train_test_split
trainset, testset = train_test_split(dataset, test_size=test_size)
cross_validate(algo, trainset, measures=['RMSE', 'MAE'],cv=5, verbose=False, n_jobs=-1)

It gives the following error:它给出了以下错误:

AttributeError: 'Trainset' object has no attribute 'raw_ratings'

I looked it up and Surprise documentation says that Trainset objects are not the same as dataset objects, which makes sense.我查了一下, Surprise 文档说 Trainset 对象与数据集对象不同,这是有道理的。

However, the documentation does not say how to convert the trainset to dataset.但是,文档没有说明如何将训练集转换为数据集。

My question is: 1. Is it possible to convert Surprise Trainset to surprise Dataset?我的问题是: 1. 是否可以将 Surprise Trainset 转换为惊喜数据集? 2. If not, what is the correct way to train-test split the whole dataset and cross-validate? 2. 如果不是,训练测试拆分整个数据集并交叉验证的正确方法是什么?

  1. From my understanding, cross-validate will perform the trainset(s)/testset(s) splits for you.据我了解,交叉验证将为您执行训练集/测试集拆分。 So your first line of code is correct and will split into 5 folds(cv=5).所以你的第一行代码是正确的,将分成5折(cv=5)。 Each fold will be the test for the other 4 (train).每个折叠都将是对其他 4 个(火车)的测试。

If you wanted a simple train/test set, see this example from the docs .如果您想要一个简单的训练/测试集,请参阅文档中的这个示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我的交叉验证始终比训练测试拆分表现更好? - Why does my cross-validation consistently perform better than train-test split? K-2交叉验证是否实质上等于50:50的火车测试间隔? - Is K-2 cross-validation essentially equal to train-test split of 50:50? 基于python中的多个特征的训练-测试分割的分层交叉验证或抽样 - Stratified Cross Validation or Sampling for train-test split based on multiple features in python 如何在训练测试拆分后仅标准化 int64 列? - How do I standardize only int64 columns after train-test split? 用于 LSTM 的时间序列数据的训练测试拆分 - Train-Test split for Time Series Data to be used for LSTM 火车测试拆分似乎在Python中无法正常工作? - Train-test split does not seem to work properly in Python? 关于时间序列中训练测试拆分的问题 - Question about Train-Test Split in Time Series 时间序列数据中 LSTM 训练测试拆分中的问题 - Problem in LSTM train-test split in time series data 使用两个分层类的自定义训练测试拆分 - Custom train-test split using two stratified classes 创建定义比例的随机训练测试拆分,同时保持每个集合中一个属性的排他性 - Create random train-test split of defined proportion while maintaining exclusivity of one attribute in each set
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM