简体   繁体   English

从矩阵列表创建X%概率矩阵

[英]create a X % probability matrix from list of matrices

I have a list of matrices (some hundred thousands). 我有一个矩阵列表(数十万个)。 I want to create a single matrix where the cells correspond to eg the 95%. 我想创建一个单元格对应于例如95%的单个矩阵。 With that I mean this: if eg cell mat[1,2] is positive (ie >0) in 95% of the matrices it is scored a 1, and if eg cell mat[2,1] is negative (ie <0) in 95% of the matrices it is scored a -1. 我的意思是:如果在95%的矩阵中,例如cell mat[1,2]为正(即> 0),则其得分为1;如果例如cell mat[2,1]为负(即<0 )在95%的矩阵中得分为-1。 If they fall below this threshold they are scored a 0. 如果它们低于此阈值,则它们的得分为0。

#Dummy data
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0.5, sd = 1),5,5)}
listX2 <- listX
for(i in 1:10) { listX2[[i]] <- ifelse(listX[[i]] >0, 1, -1) }

For the sake of the dummy data, the 95% can be changed to say 60%, such that the cells that keep their sign in 6 out of 10 matrices are kept and scored either 1 or -1 and the rest 0. 为了获得虚拟数据,可以将95%更改为60%,以便保留在10个矩阵中的6个矩阵中保持其符号的单元格,并将其计为1或-1,其余记为0。

I'm stuck on this, hence cannot provide any more code. 我一直坚持下去,因此无法提供更多代码。

I would do: 我会做:

listX <- list()
set.seed(20)
# I set seed for reproducability, and changed your mean so you could see the negatives
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1),5,5)}

threshold <- 0.7
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0   -1    1
[2,]   -1    1   -1   -1    1
[3,]    0    0    0    1    1
[4,]    0    1    0    0    0
[5,]    0    0    0    0    0

This basically checks both conditions, and adds the two checks together. 这基本上检查了两个条件,并将两个检查加在一起。 To break down one of the conditions (Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold) 分解条件之一(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold)

lapply(listX,function(x){x > 0}) loops through each matrix and converts it to aa matrix of true/false for every value that is above zero. lapply(listX,function(x){x > 0})循环遍历每个矩阵,并对每个大于零的值将其转换为true / false矩阵。

Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) then adds these all together ( Reduce ), and divides by the number of obeservations. Reduce('+',lapply(listX,function(x){x > 0}))/length(listX)然后将它们全部加在一起( Reduce ),然后除以观察次数。 If the proportion is greater than our threshold, we set that value to one, and if not it is zero. 如果比例大于阈值,则将该值设置为1,否则将其设置为零。

We then subtract the same matrix with x < 0 as the test, which gives -1 in each case where enough sub-values are negative. 然后,我们减去x < 0的相同矩阵作为测试,在每种情况下,如果有足够的子值为负,则得出-1

You can change the list to an array and then take the mean over the dimensions. 您可以将列表更改为数组,然后对维度取平均值。

arr <- simplify2array(listX)
grzero = rowMeans(arr > 0, dims = 2) 
lezero = rowMeans(arr < 0, dims = 2)  

prop = 0.6

1* (grzero >= prop) + -1* (lezero >= prop)

Test case showing which answers work so far! 测试用例显示到目前为止,哪些答案有效! (edit) (编辑)

Below you'll find my original answer. 在下面,您将找到我的原始答案。 It ended up producing comparable results to the other answers on test cases involving randomly seeded data. 最终,在涉及随机种子数据的测试用例上,其结果可与其他答案相比。 To triple check, I created a small test data set with a known answer. 为了进行三重检查,我创建了一个带有已知答案的小型测试数据集。 It turns out that only answer by @Chris passes right now (though @user20650 should be ok if using >= on this example as indicated in comments). 事实证明,只有@Chris的答案现在才通过(尽管如注释中所示,如果在此示例中使用>= ,则@ user20650应该可以)。 Here it is in case anyone else wants to use it: 这是万一其他人想要使用它的情况:

listX <- list(
  matrix(c(1,0,-1,1), nrow = 2),
  matrix(c(1,0,-1,1), nrow = 2),
  matrix(c(1,0, 1,0), nrow = 2)
)

# With any threshold < .67,
# result should be...
matrix(c(1, 0, -1, 1), nrow = 2)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1

# Otherwise...
matrix(c(1, 0, 0, 0), nrow = 2)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    0

# @Chris answer passes
threshold <- 0.5
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1

threshold <- 1.0
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    0

# My function fails...
prob_matrix(listX, .5)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1
prob_matrix(listX,  1)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1

# @user20650 answer fails...
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 0.5
1* (grzero > prop) + -1* (lezero > prop)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1

arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 1.0
1* (grzero > prop) + -1* (lezero > prop)
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0

Original answer 原始答案

Here's one approach... 这是一种方法

  • Combine sign and Reduce to do a cumulative sum of the signs of values in each cell, returning a single matrix. 结合signReduce做的值在每个单元中的标志的累加值,返回一个矩阵。
  • Any cells where this value is less than the threshold number (your probability * number of matrices in the list) is converted to 0. 此值小于阈值数(您的概率*列表中的矩阵数)的所有像元都将转换为0。
  • Return the sign() of all cells. 返回所有单元格的sign()

Below is an example with a wrapper function: 下面是带有包装函数的示例:

Toy data... 玩具数据...

set.seed(12)
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1), 5, 5)}

Function... 功能...

prob_matrix <- function(matrix_list, prob) {
  # Sum the signs of values in each cell
  matrix_list <- lapply(matrix_list, sign)
  x <- Reduce(`+`, matrix_list)

  # Convert cells below prob to 0, others to relevant sign
  x[abs(x) < (prob * length(matrix_list)) / 2] <- 0
  sign(x)
}

Example cases... 示例案例...

prob_matrix(listX, .2)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   -1    1    0    1    0
#> [2,]   -1    0   -1   -1    0
#> [3,]    1   -1    1    1    1
#> [4,]    0   -1    1    1   -1
#> [5,]   -1    0   -1    0   -1

prob_matrix(listX, .4)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   -1    1    0    1    0
#> [2,]   -1    0   -1   -1    0
#> [3,]    1   -1    1    1    1
#> [4,]    0   -1    1    1   -1
#> [5,]   -1    0   -1    0   -1

prob_matrix(listX, .6)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    0    1    0
#> [2,]   -1    0    0   -1    0
#> [3,]    1   -1    0    1    1
#> [4,]    0    0    0    1   -1
#> [5,]   -1    0    0    0   -1

prob_matrix(listX, .8)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    0    1    0
#> [2,]   -1    0    0   -1    0
#> [3,]    1   -1    0    1    1
#> [4,]    0    0    0    1   -1
#> [5,]   -1    0    0    0   -1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM