简体   繁体   中英

ffdf object consumes extra RAM (in GB)

I have decided to test the key advantage of ff package - RAM minimal allocation (PC specs: i5, RAM 8Gb, Win7 64 bit, Rstudio).

According to the package discription we can manipulate physical objects (files) like virtual ones as if they are allocated into RAM. Thus, actual RAM usage is reduced greatly (from Gb to kb). The code I have used as follows:

library(ff)
library(ffbase)

setwd("D:/My_package/Personal/R/reading")
x<-cbind(rnorm(1:100000000),rnorm(1:100000000),1:100000000)
system.time(write.csv2(x,"test.csv",row.names=FALSE))

system.time(x <- read.csv2.ffdf(file="test.csv", header=TRUE,       first.rows=100000, next.rows=100000000,levels=NULL))         
print(object.size(x)/1024/1024)
print(class(x))

The actual file size is 4.5 Gb, the actual RAM used varies in such a way (by Task Manager): 2,92 -> upper limit(~8Gb) -> 5.25Gb. The object size (by object.size()) is about 12 kb.

My concern is about RAM extra allocations (~2.3 GB). According to the package discription it should have increased only by 12 kb. I dont use any characters.

Maybe I have missed something of ff package.

Well, I have found a solution to eliminate the use of extra RAM.

First of all it is necessary to pay attention to such arguments as 'first.rows' and 'next.rows' of method 'read.table.ffdf' in ff package.

The first argument ('first.rows') stipulates the initial chunk in row quantity and it stipulates the initial memory allocation. I have used the default value (1000 rows).

The extra memory allocation is the subject of the second argument ('next.rows'). If you want to have ffdf object without extra RAM allocations (in my case - in Gb) so you need to select such a number of rows for the next chunk that the size of the chunk should not exceed the value of 'getOption("ffbatchbytes")'.

In my case I have used 'first.rows=1000' and 'next.rows=1000' and the total RAM allocation has varied up to 1Mb in Task Manager. The increase of 'next.rows' up to 10000 has caused the RAM growth by 8-9 Mb.

So this arguments are subject to your experiments to pick up the best proportion.

Besides, you must keep in mind that the increase of 'next.rows' will impact the processing time to make ffdf object(by several runs):

'first.rows=1000' and 'next.rows=1000' is around 1500 sec. (RAM ~ 1Mb) 'first.rows=1000' and 'next.rows=10000' is around 230 sec. (RAM ~ 9Mb)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM