[英]variable lengths differ (found for 'x')
I've seen several cases of this error, but none of them seem to solve or apply to my situation.我见过几个这个错误的案例,但似乎没有一个能解决或适用于我的情况。
I am building a logistic regression model with biglm
.我正在使用biglm
构建逻辑回归模型。
I have a data.frame with ~250 variables and a little over a million rows.我有一个包含约 250 个变量和超过一百万行的 data.frame。
Since bigglm()
doesn't work with the dot notation to select all variables in the model I am building my formula like this .由于bigglm()
不能使用点符号来选择模型中的所有变量,因此我正在构建这样的公式。
So if f
is my formula and df
is my dataframe, then my model looks like this:因此,如果f
是我的公式, df
是我的数据框,那么我的模型如下所示:
fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)
And I get the error: variable lengths differ (found for 'x')
我得到了错误: variable lengths differ (found for 'x')
When I check for length of x
it is exactly the same as length of df
.当我检查x
的长度时,它与df
的长度完全相同。
Other StackOverflow questions seem to suggest it might be a problem with the way the formula is constructed.其他 StackOverflow 问题似乎表明公式的构建方式可能存在问题。 Or perhaps it is a problem with biglm?
或者也许是biglm?
I was able to solve this issue by making a slight modification in the way I was constructing my formula for bigglm()
我能够通过对构建bigglm()
公式的方式进行轻微修改来解决这个问题
As shown in the link attached in my question, I was constructing the formula like this:如我的问题中附加的链接所示,我正在构建这样的公式:
n <- names(df)
f <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))
What f
was missing was the df$
before each variable name in the formula. f
缺少的是公式中每个变量名之前的df$
。 Modifying the as.formula()
function to concatenate "df$"
to each variable name fixed this issue.修改as.formula()
函数以将"df$"
连接到每个变量名修复了这个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.