简体   繁体   English

有没有办法根据行,列和变量类型猜测data.frame的大小?

[英]Is there a way to guess the size of data.frame based on rows, columns and variable types?

I am expecting to generate a lot of data and then catch it R. How can I estimate the size of the data.frame (and thus memory needed) by the number of rows, number of columns and variable types? 我期望生成大量数据然后捕获它R.我如何通过行数,列数和变量类型来估计data.frame(以及所需的内存)的大小?

Example. 例。

If I have 10000 rows and 150 columns out of which 120 are numeric, 20 are strings and 10 are factor level, what is the size of the data frame I can expect? 如果我有10000行和150列,其中120是数字,20是字符串,10是因子级别,我可以期望的数据帧的大小是多少? Will the results change depending on the data stored in the columns (as in max(nchar(column)) )? 结果是否会根据列中存储的数据而变化(如max(nchar(column)) )?

> m <- matrix(1,nrow=1e5,ncol=150)
> m <- as.data.frame(m)
> object.size(m)
120009920 bytes
> a=object.size(m)/(nrow(m)*ncol(m))
> a
8.00066133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.character)
> b=object.size(m)/(nrow(m)*ncol(m))
> b
4.00098133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.factor)
> c=object.size(m)/(nrow(m)*ncol(m))
> c
4.00098133333333 bytes
> m <- matrix("ajayajay",nrow=1e5,ncol=150)
> 
> m <- as.data.frame(m)
> object.size(m)
60047120 bytes
> d=object.size(m)/(nrow(m)*ncol(m))
> d
4.00314133333333 bytes

You can simulate an object and compute an estimation of the memory that is being used to store it as an R object using object.size : 您可以使用object.size模拟对象并计算用于将其存储为R对象的内存估计:

m <- matrix(1,nrow=1e5,ncol=150)
m <- as.data.frame(m)
m[,1:20] <- sapply(m[,1:20],as.character)
m[,29:30] <- sapply(m[,29:30],as.factor)
object.size(m)
120017224 bytes
print(object.size(m),units="Gb")
0.1 Gb

Check out pryr package as well. 查看pryr包。 It has object_size which may be slightly better for you. 它有object_size ,可能会稍微好一些。 From the advanced R 来自先进的R.

This function is better than the built-in object.size() because it accounts for shared elements within an object and includes the size of environments. 此函数优于内置的object.size(),因为它会占用对象中的共享元素并包含环境的大小。

You also need to account for the size of attributes as well as the column types etc. 您还需要考虑attributes的大小以及列类型等。

object.size(attributes(m))

You could create dummy variables that store examples of the data you will be storing in the dataframe. 您可以创建虚拟变量,用于存储将存储在数据框中的数据示例。

Then use object.size() to find their size and multiply with the rows and columns accordingly. 然后使用object.size()查找它们的大小并相应地乘以行和列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM