[英]rm(list = ls()) on 64GB then Error: cannot allocate vector of size 11.6 Gb when trying a decision tree
I'm trying to run a decision tree via caret package. 我试图通过插入符号包运行决策树。 I start my script fresh by removing everything from memory with
rm(list = ls())
then I load my training data which is 3M rows and 522 features. 我通过使用
rm(list = ls())
从内存中删除所有内容来重新开始脚本,然后加载3M行和522个特征的训练数据。 R studio doesn't show the size in gb but presumably by the error message it's 11.6. R studio没有以gb为单位显示大小,但错误消息大概是11.6。
If I'm using 64gb R then is it expected I see this error? 如果我使用的是64gb R,那么是否应该看到此错误? Is there any way around it without resorting to training on smaller data?
是否有任何解决方法,而不必依靠较小数据的训练?
rm(list = ls())
library(tidyverse)
library(caret)
library(xgboost)
# read in data
training_data <- readRDS("/home/myname/training_data.rds")
R studio environment pane currently shows one object, training data with the dims mentioned above. R studio环境窗格当前显示一个对象,并带有上述暗淡的训练数据。
### Modelling
# tuning & parameters
set.seed(123)
train_control <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE, # IMPORTANT!
verboseIter = TRUE,
allowParallel = TRUE
)
# Fit a decision tree (minus cad field)
print("begin decision tree regular")
mod_decitiontree <- train(
cluster ~.,
tuneLength = 5,
data = select(training_data, -c(cad, id)), # a data frame
method = "rpart",
trControl = train_control,
na.action = na.pass
)
Loading required package: rpart
Error: cannot allocate vector of size 11.6 Gb
I could ask our admin to increase my RAM but before doing that want to make sure I'm not missing something. 我可以要求管理员增加RAM,但是在执行此操作之前,请确保我没有丢失任何内容。 Don't I have lot's of RAM available if I'm on 64 GB?
如果我使用的是64 GB,是否没有很多可用的RAM?
Do I have any options? 我有什么选择吗? I tried making my data frame a matrix and passing that to caret instead but it threw an error.
我尝试将数据框制成矩阵,然后将其传递给插入符号,但这引发了错误。 Is passing a matrix instead a worthwhile endevour?
传递矩阵是值得的吗?
Here is your error message reproduced: 这是您的错误消息转载:
cannot allocate vector of size 11.6 Gb when trying a decision tree
尝试决策树时无法分配大小为11.6 Gb的向量
This means that the specific failure happened when R requested another 11.6 GB of memory, and was unable to do so. 这意味着当R请求另外的11.6 GB内存而无法执行时,发生了特定故障。 However, the random forest calculation itself may require many such allocations, and, most likely, the remainder of free RAM was already being used.
但是,随机森林计算本身可能需要许多这样的分配,并且很可能剩余的可用RAM已被使用。
I don't know the details of your calculation, but I would say that even running random forests on a 1GB data set is already very large. 我不知道您的计算细节,但是我想说,即使是在1GB数据集上运行随机森林也已经非常大。 My advice would be to find a way to take a statistically accurate sub sample of your data set such that you don't need such large numbers of RAM.
我的建议是找到一种方法来对数据集进行统计上准确的子样本,从而不需要大量的RAM。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.