简体   繁体   English

R all.equal中可能存在的错误

[英]Possible bug in R all.equal

I have faced some strange behavior in the R's all.equal function. 我在R的all.equal函数中遇到了一些奇怪的行为。 Basically, I create two same data.frames differently and then call the all.equal function (checking data and attributes as well). 基本上,我创建两个相同的data.frames,然后调用all.equal函数(也检查数据和属性)。

The code to reproduce the behavior is as follows: 重现行为的代码如下:

var.a <- data.frame(cbind(as.integer(c(1,5,9)), as.integer(c(1,5,9))))
colnames(var.a) <- c("C1", "C2")
rownames(var.a) <- c("1","5","9")

var.b <- data.frame(matrix(NA, nrow = 10, ncol = 2))
var.b[, 1] <- 1:10
var.b[, 2] <- 1:10
colnames(var.b) <- c("C1", "C2")
var.b <- var.b[seq(1, nrow(var.b), 4), ]

all.equal(var.a, var.b)

Is this a bug or am I just missing something? 这是一个错误还是我错过了一些东西? I did quite some debugging of the all.equall function and it appears the problem is the the rownames of the data.frames (once them being a character the other time a numeric vector). 我对all.equall函数做了很多调试,看起来问题是data.frames的rownames(一旦它们成为一个字符,另一次是数字向量)。 The response of the all.equall function: all.equall函数的响应:

[1] "Attributes: < Component 2: Modes: character, numeric >" [1]“属性:<组件2:模式:字符,数字>”
[2] "Attributes: < Component 2: target is character, current is numeric >" [2]“属性:<组件2:目标是字符,当前是数字>”

However, 然而,

typeof(rownames(var.a)) == typeof(rownames(var.b)) typeof(rownames(var.a))== typeof(rownames(var.b))

returns TRUE , which confuses me. 返回TRUE ,这让我很困惑。

PS: The structure of the objects seems the same: PS:对象的结构看起来是一样的:

> str(var.a)
'data.frame':   3 obs. of  2 variables:
$ C1: int  1 5 9
$ C2: int  1 5 9
> str(var.b)
'data.frame':   3 obs. of  2 variables:
$ C1: int  1 5 9
$ C2: int  1 5 9

I would appreciate if someone could shed some light on this. 如果有人能对此有所了解,我将不胜感激。

(I'm not exactly clear what bug you are thinking you have found. The data frames were not created the same way.) There are two differences in the structures of var.a and var.b: The mode of the elements in the columns: numeric in 'var.a' and integer in 'var.b'; (我不清楚你认为你发现了什么错误。数据框架的创建方式不同。)var.a和var.b的结构有两个不同之处:元素的模式在columns:'var.a'中的numeric和'var.b'中的integer ; and the mode of the rownames: integer for 'var.a' and character in 'var.b': 和rownames的模式:'var.a'的integer和'var.b'中的character

> dput(var.b)
structure(list(C1 = c(1L, 5L, 9L), C2 = c(1L, 5L, 9L)), .Names = c("C1", 
"C2"), row.names = c(1L, 5L, 9L), class = "data.frame")
> dput(var.a)
structure(list(C1 = c(1, 5, 9), C2 = c(1, 5, 9)), .Names = c("C1", 
"C2"), row.names = c("1", "5", "9"), class = "data.frame")

> mode(attr(var.b, "row.names"))
[1] "numeric"
> storage.mode(attr(var.b, "row.names"))
[1] "integer"
> mode(attr(var.a, "row.names"))
[1] "character"

Added note: If you wanted to check for numerical equality you should use the 'check.attributes' switch: 添加注释:如果要检查数字相等,则应使用'check.attributes'开关:

> all.equal(var.a, var.b, check.attributes=FALSE)
[1] TRUE

If you look at var.b with dput , you can see that the rownames are numeric: 如果你看一下var.bdput ,你可以看到rownames是数字:

> dput(var.b)
structure(list(C1 = c(1L, 5L, 9L), C2 = c(1L, 5L, 9L)), .Names = c("C1", 
"C2"), row.names = c(1L, 5L, 9L), class = "data.frame")

However, 然而,

typeof(rownames(var.a)) == typeof(rownames(var.b)) typeof(rownames(var.a))== typeof(rownames(var.b))

returns TRUE, which confuses me. 返回TRUE,这让我很困惑。

In addition to the most voted answer, note that the attributes are stored as "character" for var.a and as "numeric" for var.b : 除了最投票的回答,请注意,属性被存储为"character"var.a"numeric"var.b

> attr(var.a, "row.names")
[1] "1" "5" "9"
> attr(var.b, "row.names")
[1] 1 5 9

Whereas the rownames() function will coerce its output value to "character" : rownames()函数会将其输出值强制转换为"character"

> rownames(var.a)
[1] "1" "5" "9"
> rownames(var.b)
[1] "1" "5" "9"

This is why you get TRUE in the command above. 这就是你在上面的命令中得到TRUE原因。 As per ?rownames : ?rownames

For a data frame, value for rownames should be a character vector of non-duplicated and non-missing names (this is enforced), and for colnames a character vector of (preferably) unique syntactically-valid names. 对于数据框,rownames的值应该是非重复和非缺失名称的字符向量(这是强制执行的),并且对于colnames,是(最好)唯一的语法有效名称的字符向量。 In both cases, value will be coerced by as.character, and setting colnames will convert the row names to character. 在这两种情况下,值都将由as.character强制执行,设置colnames会将行名称转换为字符。

A more pertinent check would be: 更相关的检查将是:

> typeof(attr(var.a, "row.names")) == typeof(attr(var.b, "row.names"))
[1] FALSE

This said, I believe that all.equal() messages are cryptic at best... 这说,我相信all.equal()消息充其量是神秘的......

One is of mode numeric and the other is of mode integer. 一个是模式数字,另一个是模式整数。 You can see this with: 你可以看到这个:

str(var.a); str(var.b)


> str(var.a); str(var.b)
'data.frame':   3 obs. of  2 variables:
 $ C1: num  1 5 9
 $ C2: num  1 5 9
'data.frame':   3 obs. of  2 variables:
 $ C1: int  1 5 9
 $ C2: int  1 5 9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM