简体   繁体   English

如何从列表中删除浮点错误中重复的矩阵?

[英]How do I remove matrices from a list that are duplicates within floating-point error?

This question is similar to questions that have been asked regarding floating-point error in other languages (for example here ), however I haven't found a satisfactory solution. 这个问题类似于其他语言中有关浮点错误的问题(例如这里 ),但是我没有找到满意的解决方案。

I'm working on a project that involves investigating matrices that share certain characteristics. 我正在开展一个涉及调查具有某些特征的矩阵的项目。 As part of that, I need to know how many matrices in a list are unique. 作为其中的一部分,我需要知道列表中有多少矩阵是唯一的。

 D <- as.matrix(read.table("datasource",...))
 mat_list <- vector('list',length=length(samples_list))
 mat_list <- lapply(1:length(samples_list),function(i) matrix(data=0,nrow(D),ncol(D)))

This list is then populated by computations from the data based on the elements of samples_list . 然后,基于samples_list的元素,通过来自数据的计算来填充该列表。 After mat_list has been populated, I need to removed duplicates. 填充mat_list后,我需要删除重复项。 Running 运行

mat_list <- unique(mat_list)

narrows things down quite a bit; 把事情缩小了很多; however, many of those elements are really within machine error of each other. 但是,其中许多元素实际上都是彼此的机器错误。 The function unique does not allow one to specify precision, and I was unable to find source code for modification. unique函数不允许指定精度,我无法找到修改源代码。

One idea I had was this: 我有一个想法是:

ErrorReduction<-function(mat_list, tol=2){
  len <- length(mat_list)
  diff <- mat_list[[i]]-mat_list[[i+1]]
  for(i in 1:len-1){
     if(norm(diff,"i")<tol){
     mat_list[[i+1]] <- mat_list[i]
     }
  }
  mat_list<-unique(mat_list)
  return(mat_list)
}

but this only looks at pairwise differences. 但这只关注成对差异。 It would be simple but most likely inefficient to do this with nested for loops. 使用嵌套for循环来做这件事很简单但很可能效率低下。

What methods do you know of, or what ideas do you have, of handling the problem of identifying and removing matrices that are within machine error of being duplicates? 您知道哪些方法或者您有什么想法来处理识别和删除机器错误中重复的矩阵的问题?

Here is a function that applies all.equal to every pair using outer and removes all duplicates: 这是一个函数,它使用outerall.equal应用于每对,并删除所有重复项:

approx.unique <- function(l) {
   is.equal.fun <- function(i, j)isTRUE(all.equal(norm(l[[i]] - l[[j]], "M"), 0))
   is.equal.mat <- outer(seq_along(l), seq_along(l), Vectorize(is.equal.fun))
   is.duplicate <- colSums(is.equal.mat * upper.tri(is.equal.mat)) > 0
   l[!is.duplicate]
}

An example: 一个例子:

a <- matrix(runif(12), 4, 3)
b <- matrix(runif(12), 4, 3)
c <- matrix(runif(12), 4, 3)

all <- list(a1 = a, b1 = b, a2 = a, a3 = a, b2 = b, c1 = c)

names(approx.unique(all))
# [1] "a1" "b1" "c1"

I believe you are looking for all.equal which compares objects 'within machine error'. 我相信你正在寻找all.equal来比较机器错误中的对象'。 Check out ?all.equal . 退房?all.equal

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM