简体   繁体   中英

data.table R fwrite bug on Red Hat Linux

I have been using data.table (v1.10) and noticed a bug when using fwrite. Some background.

sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

Have multi-core machine.

Generate some data

#Generate some data
rows = 2500000
set.seed(Sys.time())
DF <- data.frame(index = 1:rows,
             catsA = sample((letters[1:10]),rows,replace=T),
             catsB = sample((letters[1:10]),rows,replace=T),
             catsC = sample((letters[1:10]),rows,replace=T),
             catsD = sample((letters[1:10]),rows,replace=T),
             catsE = sample((letters[1:10]),rows,replace=T),
             valueA = round(rnorm(rows),3),
             valueB = rpois(rows, lambda = 4))

#Convert to data.table
DT <- data.table(DF) 
#Create a new column
DT <- DT[,valueNew := valueA*valueB]

#Write
write.csv(DT,file="DT_write_csv.csv",row.names=F)
fwrite(DT, file = "DT_fwrite.csv",row.names=F)

Read back in and join

#Read back in and join
DT_csv <- fread("DT_write_csv.csv")
DT_fwrite <- fread("DT_fwrite.csv")

setkey(DT_csv,"index")
setkey(DT_fwrite,"index")
join_DT <- DT_csv[DT_fwrite]

Compare

nrow(join_DT[valueNew != i.valueNew])
[1] 1
join_DT[valueNew != i.valueNew,.(index,valueNew,i.valueNew)]
   index valueNew i.valueNew
1: 67097    2.855       5.71
DT[index==67097,.(valueNew)]
   valueNew
1:    2.855 

From the Compare the original DT has the a that fwrite corrupts. Sometimes it is more than one row and in a real-life example propagated across many columns.

Am I doing something wrong with the fwrite?

Yes there is a bug in fwrite . Fixed in dev last week and I'll try and get it to CRAN soon. Please check NEWS link at the top of homepage, bug fix item 3 :

fwrite() could write floating point values incorrectly, #1968 . A thread-local variable was incorrectly thread-global. This variable's usage lifetime is only a few clock cycles so it needed large data and many threads for several threads to overlap their usage of it and cause the problem. Many thanks to @mgahan and @jmosser for finding and reporting.

Please try from dev by typing the command here . I know that dev is currently failing Travis (an unrelated reason), which is why the installation command has been setup to install the last-passing commit from dev and therefore should be ok.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM