Exporting an FFDF without NA values

Question

I need to share data sets that I've imported into R as ffdf objects. My aim is to easily be able to export my ffdf datasets into CSV format, without having to worry about NA values which just inflate the size of the output file.

If I were working with a simple dataframe, I would use the following syntax:

write.csv(df, "C:/path/data.csv", row.names=FALSE, na="")

But the write.csv.ffdf function doesn't seem to take "na" as an argument. Can anyone tell me the correct syntax so that I don't have to do post processing on the output file to take away the NA values?

Answer 1

I think you are making inaccurate characterization of the behavior of write.csv.ffdf .

require(ff)  
# What follows is a minor modification of the first example in the `write.* help page.

> x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=c(NA, 2:26), 
                  dbl=c(1:25,NA) + 0.1, fac=factor(c(letters[2:26], NA)),
                  ord=c(NA, ordered(LETTERS[2:26])), dct=Sys.time()+1:26, 
                  dat=seq(as.Date("1910/1/1"), length.out=26, by=1))
>  ffx <- as.ffdf(x)
> write.csv(ffx, na="")
"","log","int","dbl","fac","ord","dct","dat"
"1",FALSE,,1.1,"b",,2012-12-18 12:18:23,1910-01-01
"2",TRUE,2,2.1,"c",1,2012-12-18 12:18:24,1910-01-02
"3",FALSE,3,3.1,"d",2,2012-12-18 12:18:25,1910-01-03
"4",TRUE,4,4.1,"e",3,2012-12-18 12:18:26,1910-01-04
"5",FALSE,5,5.1,"f",4,2012-12-18 12:18:27,1910-01-05
"6",TRUE,6,6.1,"g",5,2012-12-18 12:18:28,1910-01-06
"7",FALSE,7,7.1,"h",6,2012-12-18 12:18:29,1910-01-07
"8",TRUE,8,8.1,"i",7,2012-12-18 12:18:30,1910-01-08
"9",FALSE,9,9.1,"j",8,2012-12-18 12:18:31,1910-01-09
"10",TRUE,10,10.1,"k",9,2012-12-18 12:18:32,1910-01-10
"11",FALSE,11,11.1,"l",10,2012-12-18 12:18:33,1910-01-11
"12",TRUE,12,12.1,"m",11,2012-12-18 12:18:34,1910-01-12
"13",FALSE,13,13.1,"n",12,2012-12-18 12:18:35,1910-01-13
"14",TRUE,14,14.1,"o",13,2012-12-18 12:18:36,1910-01-14
"15",FALSE,15,15.1,"p",14,2012-12-18 12:18:37,1910-01-15
"16",TRUE,16,16.1,"q",15,2012-12-18 12:18:38,1910-01-16
"17",FALSE,17,17.1,"r",16,2012-12-18 12:18:39,1910-01-17
"18",TRUE,18,18.1,"s",17,2012-12-18 12:18:40,1910-01-18
"19",FALSE,19,19.1,"t",18,2012-12-18 12:18:41,1910-01-19
"20",TRUE,20,20.1,"u",19,2012-12-18 12:18:42,1910-01-20
"21",FALSE,21,21.1,"v",20,2012-12-18 12:18:43,1910-01-21
"22",TRUE,22,22.1,"w",21,2012-12-18 12:18:44,1910-01-22
"23",FALSE,23,23.1,"x",22,2012-12-18 12:18:45,1910-01-23
"24",TRUE,24,24.1,"y",23,2012-12-18 12:18:46,1910-01-24
"25",FALSE,25,25.1,"z",24,2012-12-18 12:18:47,1910-01-25
"26",TRUE,26,,,25,2012-12-18 12:18:48,1910-01-26

If your goal is minimizing the RAM footprint during write operations, then first look at:

getOption("ffbatchbytes")

Answer 2

write.csv.ffdf does not have an na parameter, but write.table.ffdf passes the na parameter onto the write.table1 function that it wraps. Just use sep="," as well and you are good to go.

This will work even for large ff variables.

Exporting an FFDF without NA values

Question

2 answers

solution1
1 ACCPTED 2012-12-18 20:23:03

solution2
0 2013-08-14 17:37:22

Exporting an FFDF without NA values

Question

2 answers

solution1 1 ACCPTED 2012-12-18 20:23:03

solution2 0 2013-08-14 17:37:22

solution1
1 ACCPTED 2012-12-18 20:23:03

solution2
0 2013-08-14 17:37:22