I'm trying to fit a logistic regression model to my data, using glmnet (for lasso) and caret (for k-fold cross-validation). I've tried two different syntaxes, but they both throw an error:
fitControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
verboseIter = TRUE)
# with response as a integer (0/1)
fit_logistic <- train(response ~.,
data = df_without,
method = "glmnet",
trControl = fitControl,
family = "binomial")
Error in cut.default(y, breaks, include.lowest = TRUE) :
invalid number of intervals
df_without$response <- as.factor(df_without$response)
# with response as a factor
fit_logistic <- train(as.matrix(df_without[1:47]), df_without$response,
method = "glmnet",
trControl = fitControl,
family = "binomial")
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning message:
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NAs introduced by coercion
Do I need to convert my dataframe to a matrix or not?
Does my response variable need to be a factor or just 0/1 integers?
The .Rdata file with the df_without data frame is here .
sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-4 plyr_1.8.2 gbm_2.1.1 survival_2.38-1 glmnet_2.0-2 foreach_1.4.2
[7] Matrix_1.2-0 caret_6.0-47 ggplot2_1.0.1 lattice_0.20-31 lubridate_1.3.3 RJDBC_0.2-5
[13] rJava_0.9-6 DBI_0.3.1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 compiler_3.2.0 nloptr_1.0.4 class_7.3-12 iterators_1.0.7
[6] tools_3.2.0 digest_0.6.8 lme4_1.1-7 memoise_0.2.1 nlme_3.1-120
[11] gtable_0.1.2 mgcv_1.8-6 brglm_0.5-9 SparseM_1.6 proto_0.3-10
[16] BradleyTerry2_1.0-6 stringr_1.0.0 gtools_3.5.0 grid_3.2.0 nnet_7.3-9
[21] minqa_1.2.4 reshape2_1.4.1 car_2.0-25 magrittr_1.5 scales_0.2.4
[26] codetools_0.2-11 MASS_7.3-40 pbkrtest_0.4-2 colorspace_1.2-6 quantreg_5.11
[31] stringi_0.4-1 munsell_0.4.2
I had the same problem, I fixed mine using the function model.matrix to deal with the coding of categorical variables.
Try this for the x argument in glmnet:
as.matrix(model.matrix(response ~ .)[, -1])
I removed the intercept column because the default in glmnet is to include an intercept.
The problem is that you have continuous variables in your dataset. GLMNET needs to have factor of binary variables.
If you run your first lines of code and select a few non-continuous variables you will see that it runs as expected.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.