[英]R: Function to identify identical rows in a binary matrix and return a label vector
I am looking for a (fast) function that identifies identical rows in a matrix containing only integers 0 and 1, that returns a vector of labels telling me which rows are identical. 我正在寻找一个(快速)函数,该函数标识仅包含整数0和1的矩阵中的相同行,该函数返回一个标签向量,告诉我哪些行相同。
Here is a reproducible example of what I want to achieve: 这是我想要实现的可复制示例:
mat = rbinom(n=1000, size=1, prob=0.8)
dim(mat) = c(200, 5)
umat = unique(mat)
idVec = numeric(nrow(mat))
for(i in seq_len(nrow(umat))){
for(j in seq_len(nrow(mat))){
if(isTRUE(all.equal(mat[j,], umat[i,]))){
idVec[j] = i
}
}
}
cbind(idVec, mat)
table(idVec)
Actually this function http://www.stat.washington.edu/~rje42/lca/html/group.html would just be perfect. 实际上,此功能http://www.stat.washington.edu/~rje42/lca/html/group.html十分完美。 However, it's not on CRAN, no source code, and was built prior to R 3.0.0.
但是,它不在CRAN上,也没有源代码,而是在R 3.0.0之前构建的。
Thank's for your help! 谢谢你的帮助!
I reduced your example mat a bit for better handling: 为了更好地处理,我对示例垫进行了一些简化:
mat = rbinom(n=100, size=1, prob=0.8)
dim(mat) = c(20, 5)
Now you can create the idVec
like this (assuming you don't care about the actual numbers, just the correct "mapping"): 现在,您可以像这样创建
idVec
(假设您不关心实际数字,只关心正确的“映射”):
idVec <- as.integer(factor(apply(mat, 1, toString)))
And of course you can add it or create the table: 当然,您可以添加它或创建表:
> cbind(idVec, mat)
idVec
[1,] 6 1 1 1 1 1
[2,] 5 1 1 1 1 0
[3,] 6 1 1 1 1 1
[4,] 5 1 1 1 1 0
[5,] 1 0 1 1 0 1
[6,] 2 0 1 1 1 1
[7,] 6 1 1 1 1 1
[8,] 6 1 1 1 1 1
[9,] 6 1 1 1 1 1
[10,] 5 1 1 1 1 0
[11,] 4 1 0 1 1 1
[12,] 5 1 1 1 1 0
[13,] 6 1 1 1 1 1
[14,] 4 1 0 1 1 1
[15,] 3 1 0 1 0 0
[16,] 1 0 1 1 0 1
[17,] 6 1 1 1 1 1
[18,] 6 1 1 1 1 1
[19,] 6 1 1 1 1 1
[20,] 2 0 1 1 1 1
> table(idVec)
idVec
1 2 3 4 5 6
2 2 1 2 4 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.