简体   繁体   English

“glmnet R包中cv.glmnet的丢弃错误(y%*%rep(1,nc))”错误

[英]“Error in drop(y %*% rep(1, nc))” error for cv.glmnet in glmnet R package

I have a function to return the auc value for a cv.glmnet model and it often, although not the majority of the time, returns the following error when executing the cv.glmnet function: 我有一个函数来返回cv.glmnet模型的auc值,虽然不是大多数时间,但它经常在执行cv.glmnet函数时返回以下错误:

Error in drop(y % % rep(1, nc)) : error in evaluating the argument 'x' in selecting a method for function 'drop': Error in y % % rep(1, nc) : non-conformable arguments drop中的错误(y %% rep(1,nc)):在为函数'drop'选择方法时评估参数'x'时出错:y %% rep(1,nc)中的错误:不一致的参数

I've read a little bit about the error and the only suggestion I could find was to use data.matrix() instead of as.matrix(). 我已经阅读了一些关于错误的内容,我唯一能找到的建议是使用data.matrix()而不是as.matrix()。 My function is as follows (where "form" is a formula with my desired variables and "dt" is the data frame): 我的功能如下(其中“form”是带有我所需变量的公式,“dt”是数据框):

auc_cvnet <- function(form, dt, standard = F){
      vars = all.vars(form)
      depM = dt[[vars[1]]]
      indM = data.matrix(dt[vars[-1]])
      model = cv.glmnet(indM, depM, family = "binomial", nfolds=3, type.measure="auc", standardize = standard)

      pred = predict(model, indM, type = "response")
      tmp = prediction(pred, depM)
      auc.tmp = performance(tmp, "auc")
      return(as.numeric(auc.tmp@y.values))
    }

I'm implementing this function in another function that iterates through combinations of a few variables to see what combinations of variables work well (it's a pretty brute-force method). 我在另一个函数中实现了这个函数,它迭代了几个变量的组合,看看哪些变量组合运行良好(这是一个非常强力的方法)。 Anyway, I printed out the formula for the iteration when the error was thrown and called the function with just that formula and it worked fine. 无论如何,我在抛出错误时打印出迭代的公式,并且仅使用该公式调用函数并且它工作正常。 So unfortunately I can't pinpoint what calls throw an error, otherwise I'd try to give more information. 所以不幸的是我无法确定哪些调用会引发错误,否则我会尝试提供更多信息。 The data frame has about 30 rows and there are no errors when I run my code on a larger data set with 110 rows. 数据框大约有30行,当我在110行的较大数据集上运行代码时没有错误。 There also are no NAs in either data set. 两个数据集中也没有NA。

Has anyone seen this before or have any thoughts? 有没有人见过这个或有任何想法? Thanks! 谢谢!

Believe it or not, I actually got this same error today. 信不信由你,我今天实际上也遇到了同样的错误。 Since I don't know your dataset, I can't say for sure what it is, but for me, the data I was passing as my y variable (your depM) was a column of all True values. 由于我不知道你的数据集,我不能确定它是什么,但对我来说,我传递的数据是我的y变量(你的depM)是所有True值的列。 cv.glmnet would only return a valid model if my y variable contained True and False values. 如果我的y变量包含True和False值,cv.glmnet只会返回一个有效的模型。

I wish I could explain why cv.glmnet required both True and False, but I have a lack of understanding of the function itself (as it is, I am only adapting code given to me). 我希望我能解释为什么cv.glmnet需要True和False,但我对函数本身缺乏了解(因为它是,我只调整给我的代码)。 I just thought I'd post this in case it would give you some help troubleshooting. 我只是觉得我会发布这个,以防它给你一些帮助排除故障。 Good luck! 祝好运!

I have the same problem when running cv.glmnet on a dataset with 2 positive cases and 850 negative ones. 在具有2个正面情况和850个负面情况的数据集上运行cv.glmnet时,我遇到了同样的问题。 In one of the cross-validation iterations (where the training and testing sets are randomly sampled) both positive cases are sampled-out of the training set. 在一个交叉验证迭代中(其中训练和测试集被随机采样),两个正例都从训练集中取样。 Thus, glmnet calls lognet , which in turn calls drop(y %*% rep(1, nc)) but y is a vector and not a matrix with at least two columns. 因此, glmnet调用lognet ,后者又调用drop(y %*% rep(1, nc))y是一个向量而不是一个至少有两列的矩阵。

The easiest way I can think of is to specify the foldid parameter to cv.glmnet and make sure that there are at least two classes present in the data in every iteration. 我能想到的最简单的方法是将foldid参数指定给cv.glmnet ,并确保每次迭代中数据中至少存在两个类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM