简体   繁体   中英

R glm() vector too large

I am trying to run a binary logistic regression in R on a very large set of data. I keep running into memory problems. I have tried many different packages to try to circumvent this issue, but am still stuck. I thought packages such as caret and biglm would help. But they gave me the same memory error. Why is it that when I start with a dataset with 300,000 rows and 300 columns and proceed to subset it to 50,000 rows and 120 columns, it still requires the same amount of memory? It makes no sense. I have no way of replicating the data since it is sensitive information, but most of the variables are factors. Below are some examples I have tried

model = bigglm(f, data = reg, na.action = na.pass, family = binomial(link=logit), chunksize = 5000)

But I get:

Error: cannot allocate vector of size 128.7 Gb

MyControl <- trainControl(method = "repeatedCV", index = MyFolds, summaryFunction = twoClassSummary, classProbs = TRUE)  
fit = train(f, data = reg, family = binomial, trControl = MyControl)

The error message "Error: cannot allocate vector of size 128.7 Gb" doesn't meant that R cannot allocate a total memory of 128.7 Gb.

Quoting Patrick Burns :

"It is because R has already allocated a lot of memory successfully. The error message is about how much memory R was going after at the point where it failed".

So it is your interpretation of the error that is wrong. Even though the size of the problems might be very different, they are probably both just too big for your computer, and the amount of memory displayed in the error message is unrelated to the size of your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM