简体   繁体   English

rm(list = ls())on 64GB然后错误:尝试决策树时无法分配大小为11.6 Gb的向量

[英]rm(list = ls()) on 64GB then Error: cannot allocate vector of size 11.6 Gb when trying a decision tree

I'm trying to run a decision tree via caret package. 我试图通过插入符号包运行决策树。 I start my script fresh by removing everything from memory with rm(list = ls()) then I load my training data which is 3M rows and 522 features. 我通过使用rm(list = ls())从内存中删除所有内容来重新开始脚本,然后加载3M行和522个特征的训练数据。 R studio doesn't show the size in gb but presumably by the error message it's 11.6. R studio没有以gb为单位显示大小,但错误消息大概是11.6。

If I'm using 64gb R then is it expected I see this error? 如果我使用的是64gb R,那么是否应该看到此错误? Is there any way around it without resorting to training on smaller data? 是否有任何解决方法,而不必依靠较小数据的训练?

rm(list = ls())
library(tidyverse)
library(caret)
library(xgboost)

# read in data
training_data <- readRDS("/home/myname/training_data.rds")

R studio environment pane currently shows one object, training data with the dims mentioned above. R studio环境窗格当前显示一个对象,并带有上述暗淡的训练数据。

### Modelling
# tuning & parameters
set.seed(123)
train_control <- trainControl(
  method = "cv",
  number = 5,
  classProbs = TRUE, # IMPORTANT!
  verboseIter = TRUE,
  allowParallel = TRUE
)

# Fit a decision tree (minus cad field)
print("begin decision tree regular")
mod_decitiontree <- train(
  cluster ~.,
  tuneLength = 5,
  data = select(training_data, -c(cad, id)), # a data frame
  method = "rpart",
  trControl = train_control,
  na.action = na.pass
)

Loading required package: rpart
Error: cannot allocate vector of size 11.6 Gb

I could ask our admin to increase my RAM but before doing that want to make sure I'm not missing something. 我可以要求管理员增加RAM,但是在执行此操作之前,请确保我没有丢失任何内容。 Don't I have lot's of RAM available if I'm on 64 GB? 如果我使用的是64 GB,是否没有很多可用的RAM?

Do I have any options? 我有什么选择吗? I tried making my data frame a matrix and passing that to caret instead but it threw an error. 我尝试将数据框制成矩阵,然后将其传递给插入符号,但这引发了错误。 Is passing a matrix instead a worthwhile endevour? 传递矩阵是值得的吗?

Here is your error message reproduced: 这是您的错误消息转载:

cannot allocate vector of size 11.6 Gb when trying a decision tree 尝试决策树时无法分配大小为11.6 Gb的向量

This means that the specific failure happened when R requested another 11.6 GB of memory, and was unable to do so. 这意味着当R请求另外的11.6 GB内存而无法执行时,发生了特定故障。 However, the random forest calculation itself may require many such allocations, and, most likely, the remainder of free RAM was already being used. 但是,随机森林计算本身可能需要许多这样的分配,并且很可能剩余的可用RAM已被使用。

I don't know the details of your calculation, but I would say that even running random forests on a 1GB data set is already very large. 我不知道您的计算细节,但是我想说,即使是在1GB数据集上运行随机森林也已经非常大。 My advice would be to find a way to take a statistically accurate sub sample of your data set such that you don't need such large numbers of RAM. 我的建议是找到一种方法来对数据集进行统计上准确的子样本,从而不需要大量的RAM。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 错误:无法分配大小为 20.0 Gb 的向量 - Error: cannot allocate vector of size 20.0 Gb 合并命令给出错误:无法分配大小为 54.2 Gb 的向量 - Merge command give Error: cannot allocate vector of size 54.2 Gb 如何解决错误:无法在 RStudio 中分配大小为 70.7 Gb 的向量? - How to resolve error: cannot allocate vector of size 70.7 Gb in RStudio? mgcv bam()错误:无法分配大小为99.6 Gb的向量 - mgcv bam() error: cannot allocate vector of size 99.6 Gb 删除重复项错误:无法分配大小为 237.6 Gb 的向量 - Removing Duplicates Error : cannot allocate vector of size 237.6 Gb 具有R.的随机森林无法分配大小为7.5 Gb的向量 - Random Forest with R. cannot allocate vector of size 7.5 Gb R:小数据帧 10 Mb 引发错误:无法分配大小为 15.4 Gb 的向量 - R: Small Data Frame 10 Mb Throws Error: cannot allocate vector of size 15.4 Gb 合并Data.frames显示错误:无法分配大小为1.4 Gb的向量 - Merging Data.frames shows Error: cannot allocate vector of size 1.4 Gb read.csv.fdff错误:无法分配大小为6607642.0 Gb的向量 - read.csv.fdff error: cannot allocate vector of size 6607642.0 Gb 带插入符号的随机森林:错误:无法分配大小为153.1 Gb的向量 - Random Forest with caret package: Error: cannot allocate vector of size 153.1 Gb
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM