简体   繁体   English

fread/fwrite 引入垃圾值

[英]fread/fwrite introduces garbage values

Data file data.dat :数据文件data.dat

5625010350032.36719 5627008621379.12591 5628763999478.55791 5630383772880.98831 5632384688238.96095 5633992371569.87936 5635830220975.76879 5637713568911.67183 5639436594135.51215 5641160625591.58400 5643072053703.23919 5644920788572.33232 5646668772882.99855 5648398453919.33759 5650178043246.84799 5651842484825.03887 5653671759113.42399 5655374735235.55599 5657184594518.72287 5658951103084.33839 5660687853998.58127 5662491073242.24399 

The following code以下代码

x1 <- data.matrix(data.table::fread("data.dat")) # Read it
plot(x1[1,])                                     
data.table::fwrite(x=x1, file="xout.dat", sep=" ") # Write it 
x2 <- data.matrix(data.table::fread("xout.dat"))   # Read it again
lines(x2[1,], col='red')

reveals that the element x2[1,13] takes the value 2.7898250541260385e-311 when it should in fact be equal to x1[1,13] .显示元素x2[1,13]取值2.7898250541260385e-311而实际上它应该等于x1[1,13] What is causing the garbage values to be introduced?是什么导致引入垃圾值?

The data.dat file is written from a C++ file in the following way data.dat文件是从一个 C++ 文件按以下方式写入的

    std::ofstream file("data.dat", std::ios::out);
    file << std::setprecision(std::numeric_limits<long double>::digits10) << std::showpoint;
    for (size_t i = 0; i < v.size(); ++i)
        file << v[i] << " ";
    file << std::endl;

where the vector v contains the values written to data.dat .其中向量v包含写入data.dat的值。 I am using data.table version 1.14.2 and R 4.1.3.我正在使用 data.table 版本 1.14.2 和 R 4.1.3。

Appearantly it does some rounding somewhere in the process and fread stores that 13th value as integer64 "integer64" (default) reads columns detected as containing integers larger than 2^31 as type bit64::integer64 .显然,它在过程中的某处进行了一些舍入,并且 fread 将第 13 个值存储为integer64 “integer64”(默认值)读取检测为包含大于 2^31 的整数的列作为 bit64::integer64 类型

在此处输入图像描述

What you can do is force it to be interpretted as numeric, by adding colClasses = c("numeric") to your fread.您可以做的是通过将colClasses = c("numeric")添加到您的恐惧中来强制将其解释为数字。

x2 <- data.matrix(data.table::fread("xout.dat", colClasses = c("numeric")))

This does not prevent the floating point issues but does not make the 13th value be changed completely.这不会阻止浮点问题,但不会完全更改第 13 个值。

If we now do x1-x2 we see for all values we have the same sort of differences.如果我们现在做 x1-x2,我们会看到所有值都有相同的差异。

x1-x2

#              V1         V2         V3         V4         V5          V6          V7        V8        V9       V10         V11       V12
# [1,] -0.0029297 -0.0039062 -0.0019531 -0.0019531 0.00097656 -0.00097656 -0.00097656 0.0019531 0.0019531 0.0039062 -0.00097656 0.0019531
#              V13        V14        V15         V16       V17        V18       V19         V20        V21       V22
# [1,] -0.00097656 -0.0019531 -0.0019531 -0.00097656 0.0039062 -0.0039062 0.0029297 -0.00097656 0.00097656 0.0039062

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM