简体   繁体   中英

Exporting an FFDF without NA values

I need to share data sets that I've imported into R as ffdf objects. My aim is to easily be able to export my ffdf datasets into CSV format, without having to worry about NA values which just inflate the size of the output file.

If I were working with a simple dataframe, I would use the following syntax:

write.csv(df, "C:/path/data.csv", row.names=FALSE, na="")

But the write.csv.ffdf function doesn't seem to take "na" as an argument. Can anyone tell me the correct syntax so that I don't have to do post processing on the output file to take away the NA values?

I think you are making inaccurate characterization of the behavior of write.csv.ffdf .

require(ff)  
# What follows is a minor modification of the first example in the `write.* help page.

> x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=c(NA, 2:26), 
                  dbl=c(1:25,NA) + 0.1, fac=factor(c(letters[2:26], NA)),
                  ord=c(NA, ordered(LETTERS[2:26])), dct=Sys.time()+1:26, 
                  dat=seq(as.Date("1910/1/1"), length.out=26, by=1))
>  ffx <- as.ffdf(x)
> write.csv(ffx, na="")
"","log","int","dbl","fac","ord","dct","dat"
"1",FALSE,,1.1,"b",,2012-12-18 12:18:23,1910-01-01
"2",TRUE,2,2.1,"c",1,2012-12-18 12:18:24,1910-01-02
"3",FALSE,3,3.1,"d",2,2012-12-18 12:18:25,1910-01-03
"4",TRUE,4,4.1,"e",3,2012-12-18 12:18:26,1910-01-04
"5",FALSE,5,5.1,"f",4,2012-12-18 12:18:27,1910-01-05
"6",TRUE,6,6.1,"g",5,2012-12-18 12:18:28,1910-01-06
"7",FALSE,7,7.1,"h",6,2012-12-18 12:18:29,1910-01-07
"8",TRUE,8,8.1,"i",7,2012-12-18 12:18:30,1910-01-08
"9",FALSE,9,9.1,"j",8,2012-12-18 12:18:31,1910-01-09
"10",TRUE,10,10.1,"k",9,2012-12-18 12:18:32,1910-01-10
"11",FALSE,11,11.1,"l",10,2012-12-18 12:18:33,1910-01-11
"12",TRUE,12,12.1,"m",11,2012-12-18 12:18:34,1910-01-12
"13",FALSE,13,13.1,"n",12,2012-12-18 12:18:35,1910-01-13
"14",TRUE,14,14.1,"o",13,2012-12-18 12:18:36,1910-01-14
"15",FALSE,15,15.1,"p",14,2012-12-18 12:18:37,1910-01-15
"16",TRUE,16,16.1,"q",15,2012-12-18 12:18:38,1910-01-16
"17",FALSE,17,17.1,"r",16,2012-12-18 12:18:39,1910-01-17
"18",TRUE,18,18.1,"s",17,2012-12-18 12:18:40,1910-01-18
"19",FALSE,19,19.1,"t",18,2012-12-18 12:18:41,1910-01-19
"20",TRUE,20,20.1,"u",19,2012-12-18 12:18:42,1910-01-20
"21",FALSE,21,21.1,"v",20,2012-12-18 12:18:43,1910-01-21
"22",TRUE,22,22.1,"w",21,2012-12-18 12:18:44,1910-01-22
"23",FALSE,23,23.1,"x",22,2012-12-18 12:18:45,1910-01-23
"24",TRUE,24,24.1,"y",23,2012-12-18 12:18:46,1910-01-24
"25",FALSE,25,25.1,"z",24,2012-12-18 12:18:47,1910-01-25
"26",TRUE,26,,,25,2012-12-18 12:18:48,1910-01-26

If your goal is minimizing the RAM footprint during write operations, then first look at:

getOption("ffbatchbytes") 

write.csv.ffdf does not have an na parameter, but write.table.ffdf passes the na parameter onto the write.table1 function that it wraps. Just use sep="," as well and you are good to go.

This will work even for large ff variables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM