预测R中的内存使用情况

Question

I have downloaded a huge file from the UCI Machine learning Dataset library. 我从UCI机器学习数据集库下载了一个巨大的文件。 (~300mb). （〜300MB）。

Is there a way to predict the memory required to load the dataset, before loading it into R memory? 在将数据集加载到R内存之前，有没有办法预测加载数据集所需的内存？

Googled a lot, but everywhere all I could find is how to calculate memory with R-profiler and several other packages, but after loading the objects into R. 谷歌搜索了很多，但我发现的所有地方都是如何使用R-profiler和其他几个包计算内存，但是在将对象加载到R之后。

Answer 1

based on "R programming" coursera course, U can calculate the proximate memory usage using number of rows and columns within the data" U can get that info from the codebox/meta file" 基于“R编程”课程，U可以使用数据中的行数和列数来计算近似内存使用量“U可以从codebox / meta文件中获取该信息”

memory required = no. 需要内存=否。 of column * no. 列*没有。 of rows * 8 bytes/numeric 行* 8字节/数字

so for example if you have 1,500,00 rows and 120 column you will need more than 1.34 GB of spare memory required 例如，如果您有1,500,00行和120列，则需要超过1.34 GB的备用内存

U also can apply the same approach on other types of data with attention to number of bytes used to store different data types. U还可以对其他类型的数据应用相同的方法，同时注意用于存储不同数据类型的字节数。

Answer 2

If your data's stored in a csv file, you could first read in a subset of the file and calculate the memory usage in bytes with the object.size function. 如果您的数据存储在csv文件中，您可以先读入文件的子集，然后使用object.size函数计算内存使用量（以字节为单位）。 Then, you could compute the total number of lines in the file with the wc command-line utility and use the line count to scale the memory usage of your subset to get an estimate of the total usage: 然后，您可以使用wc命令行实用程序计算文件中的总行数，并使用行计数来缩放子集的内存使用情况，以估算总使用情况：

top.size <- object.size(read.csv("simulations.csv", nrow=1000))
lines <- as.numeric(gsub("[^0-9]", "", system("wc -l simulations.csv", intern=T)))
size.estimate <- lines / 1000 * top.size

Presumably there's some object overhead, so I would expect size.estimate to be an overestimate of the total memory usage when you load the whole csv file; 可能有一些对象开销，所以我希望size.estimate是一个高估了加载整个csv文件时的总内存使用量; this effect will be diminished if you use more lines to compute top.size . 如果使用更多行来计算top.size则此效果将会减弱。 Of course, this approach could be inaccurate if the first 1000 lines of your file are not representative of the overall file contents. 当然，如果文件的前1000行不能代表整个文件内容，则此方法可能不准确。

Answer 3

R has the function object.size(), that provides an estimate of the memory that is being used to store an R object. R具有函数object.size（），它提供了用于存储R对象的内存的估计值。 You can use like this: 你可以像这样使用：

  predict_data_size <- function(numeric_size, number_type = "numeric") {
  if(number_type == "integer") {
    byte_per_number = 4
  } else if(number_type == "numeric") {
    byte_per_number = 8 #[ 8 bytes por numero]
  } else {
    stop(sprintf("Unknown number_type: %s", number_type))
  }
  estimate_size_in_bytes = (numeric_size * byte_per_number)
  class(estimate_size_in_bytes) = "object_size"
  print(estimate_size_in_bytes, units = "auto")
}
# Example
# Matrix (rows=2000000, cols=100)
predict_data_size(2000000*100, "numeric") # 1.5 Gb

预测R中的内存使用情况

问题描述

3 个解决方案

解决方案1
6 2014-09-04 21:52:34

解决方案2
5 2014-09-04 22:41:05

解决方案3
0 2017-02-20 11:41:34

预测R中的内存使用情况

问题描述

3 个解决方案

解决方案1 6 2014-09-04 21:52:34

解决方案2 5 2014-09-04 22:41:05

解决方案3 0 2017-02-20 11:41:34

解决方案1
6 2014-09-04 21:52:34

解决方案2
5 2014-09-04 22:41:05

解决方案3
0 2017-02-20 11:41:34