简体   繁体   中英

How to use apply or sapply or lapply with ffdf?

Is there a way to use an apply type construct directly to the columns of a ffdf object? I am trying to count the NAs in each column without having to turn it into a standard data frame. I can get the na count for the individual columns using:

sum(is.na(ffdf$columnname))

But is there a way to do this for all the columns in the dataframe at once, something like:

lapply(ffdf, function(x){sum(is.na(x))})

When I run this I get:

$virtual
[1] 0

$physical
[1] 0

$row.names
[1] 0

I have not been able to find a special version of lapply or sapply in the ff documentation. Further is there a simple way to count the NAs over the entire ffdf in one go?

An ffdf is basically a list with elements "virtual", "physical", "row.names". If you do an lapply over the physical element, you have what you want.

require(ffbase)
myffdf <- as.ffdf(iris)
lapply(physical(myffdf), FUN=function(x) sum(is.na(x)))

As is.na and sum is generic, this will basically use is.na.ff and sum.ff from package ffbase such that data is loaded into RAM chunkwise according to what your computer can handle.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM