简体   繁体   English

在 r 中尝试使用 lapply 进行逻辑回归时出错

[英]Error while trying to use lapply for logistic regression in r

I created a subset of variable for the data (while still maintaining the rest of the data).我为数据创建了一个变量子集(同时仍然维护数据的 rest)。 Then I tried to use lapply to estimate many logistic regression models at the same time- looping through the subset as different predictor variables.然后我尝试使用 lapply 同时估计许多逻辑回归模型——循环遍历子集作为不同的预测变量。

#Creating a list for the loop to run through
metab.start <- which(colnames(df) == "Anhydro_1.5_D_glucitolArea"); metab.start
metab.stop <- which(colnames(df) == "ErythritolArea"); metab.stop
metabolite.names <- colnames(df)[metab.start:metab.stop]

#logistic regression loop
mdls<-lapply(metabolite.names, function(X) glm(hpresponse1~X, data=df, 
family="binomial"))

But it produces this error:但它会产生此错误:

Error in model.frame.default(formula = hpresponse1 ~ X, data = df, 
drop.unused.levels = TRUE) : 
variable lengths differ (found for 'X')

I am not sure, but I think the issue is the datatype for the subset object. When I used str() on the subset object (metabolite.names), it says it is a character.我不确定,但我认为问题是子集 object 的数据类型。当我在子集 object (metabolite.names) 上使用 str() 时,它说它是一个字符。 But I think lapply is for lists?但我认为 lapply 是用于列表的? Then there is also sapply and mapply you can use for vectors and matrices, correct?然后还有 sapply 和 mapply 可以用于向量和矩阵,对吗? The other concern was that I don't think the individual data values were retained when that subset was made.另一个问题是,我不认为创建该子集时会保留单个数据值。 Leading the error that the variable lengths are not the same?导致变量长度不一样的错误? Do I need to subset a different way to create a matrix?我是否需要以不同的方式子集来创建矩阵? Then use mapply?然后用mapply? Can I do that and retain the variable names and observations both?我可以这样做并同时保留变量名和观察结果吗? Or is there a way to loop through using the object I have made?或者有没有办法使用我制作的 object 进行循环? Am I wrong about what is wrong?我错了吗? if so, what could be the issue?如果是这样,可能是什么问题?

As another note, there are over 100 predictor variables I am attempting to loop through.另外请注意,我试图遍历 100 多个预测变量。 I plan to add other predictor variables that will not loop.我计划添加其他不会循环的预测变量。 But I am just trying to get the loop to work first.但我只是想让循环先工作。

In your function, X is the name of a column, not the values in the column.在您的 function 中,X 是列的名称,而不是列中的值。 Try尝试

glm(as.formula(paste0("hpresponse1~",X)), ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM