[英]create a X % probability matrix from list of matrices
I have a list of matrices (some hundred thousands). 我有一个矩阵列表(数十万个)。 I want to create a single matrix where the cells correspond to eg the 95%.
我想创建一个单元格对应于例如95%的单个矩阵。 With that I mean this: if eg cell
mat[1,2]
is positive (ie >0) in 95% of the matrices it is scored a 1, and if eg cell mat[2,1]
is negative (ie <0) in 95% of the matrices it is scored a -1. 我的意思是:如果在95%的矩阵中,例如cell
mat[1,2]
为正(即> 0),则其得分为1;如果例如cell mat[2,1]
为负(即<0 )在95%的矩阵中得分为-1。 If they fall below this threshold they are scored a 0. 如果它们低于此阈值,则它们的得分为0。
#Dummy data
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0.5, sd = 1),5,5)}
listX2 <- listX
for(i in 1:10) { listX2[[i]] <- ifelse(listX[[i]] >0, 1, -1) }
For the sake of the dummy data, the 95% can be changed to say 60%, such that the cells that keep their sign in 6 out of 10 matrices are kept and scored either 1 or -1 and the rest 0. 为了获得虚拟数据,可以将95%更改为60%,以便保留在10个矩阵中的6个矩阵中保持其符号的单元格,并将其计为1或-1,其余记为0。
I'm stuck on this, hence cannot provide any more code. 我一直坚持下去,因此无法提供更多代码。
I would do: 我会做:
listX <- list()
set.seed(20)
# I set seed for reproducability, and changed your mean so you could see the negatives
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1),5,5)}
threshold <- 0.7
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 -1 1
[2,] -1 1 -1 -1 1
[3,] 0 0 0 1 1
[4,] 0 1 0 0 0
[5,] 0 0 0 0 0
This basically checks both conditions, and adds the two checks together. 这基本上检查了两个条件,并将两个检查加在一起。 To break down one of the conditions
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold)
分解条件之一
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold)
lapply(listX,function(x){x > 0})
loops through each matrix and converts it to aa matrix of true/false for every value that is above zero. lapply(listX,function(x){x > 0})
循环遍历每个矩阵,并对每个大于零的值将其转换为true / false矩阵。
Reduce('+',lapply(listX,function(x){x > 0}))/length(listX)
then adds these all together ( Reduce
), and divides by the number of obeservations. Reduce('+',lapply(listX,function(x){x > 0}))/length(listX)
然后将它们全部加在一起( Reduce
),然后除以观察次数。 If the proportion is greater than our threshold, we set that value to one, and if not it is zero. 如果比例大于阈值,则将该值设置为1,否则将其设置为零。
We then subtract the same matrix with x < 0
as the test, which gives -1
in each case where enough sub-values are negative. 然后,我们减去
x < 0
的相同矩阵作为测试,在每种情况下,如果有足够的子值为负,则得出-1
。
You can change the list to an array and then take the mean over the dimensions. 您可以将列表更改为数组,然后对维度取平均值。
arr <- simplify2array(listX)
grzero = rowMeans(arr > 0, dims = 2)
lezero = rowMeans(arr < 0, dims = 2)
prop = 0.6
1* (grzero >= prop) + -1* (lezero >= prop)
Below you'll find my original answer. 在下面,您将找到我的原始答案。 It ended up producing comparable results to the other answers on test cases involving randomly seeded data.
最终,在涉及随机种子数据的测试用例上,其结果可与其他答案相比。 To triple check, I created a small test data set with a known answer.
为了进行三重检查,我创建了一个带有已知答案的小型测试数据集。 It turns out that only answer by @Chris passes right now (though @user20650 should be ok if using
>=
on this example as indicated in comments). 事实证明,只有@Chris的答案现在才通过(尽管如注释中所示,如果在此示例中使用
>=
,则@ user20650应该可以)。 Here it is in case anyone else wants to use it: 这是万一其他人想要使用它的情况:
listX <- list(
matrix(c(1,0,-1,1), nrow = 2),
matrix(c(1,0,-1,1), nrow = 2),
matrix(c(1,0, 1,0), nrow = 2)
)
# With any threshold < .67,
# result should be...
matrix(c(1, 0, -1, 1), nrow = 2)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
# Otherwise...
matrix(c(1, 0, 0, 0), nrow = 2)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 0
# @Chris answer passes
threshold <- 0.5
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
threshold <- 1.0
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 0
# My function fails...
prob_matrix(listX, .5)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
prob_matrix(listX, 1)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
# @user20650 answer fails...
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 0.5
1* (grzero > prop) + -1* (lezero > prop)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 1.0
1* (grzero > prop) + -1* (lezero > prop)
#> [,1] [,2]
#> [1,] 0 0
#> [2,] 0 0
Here's one approach... 这是一种方法
sign
and Reduce
to do a cumulative sum of the signs of values in each cell, returning a single matrix. sign
和Reduce
做的值在每个单元中的标志的累加值,返回一个矩阵。 sign()
of all cells. sign()
。 Below is an example with a wrapper function: 下面是带有包装函数的示例:
Toy data... 玩具数据...
set.seed(12)
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1), 5, 5)}
Function... 功能...
prob_matrix <- function(matrix_list, prob) {
# Sum the signs of values in each cell
matrix_list <- lapply(matrix_list, sign)
x <- Reduce(`+`, matrix_list)
# Convert cells below prob to 0, others to relevant sign
x[abs(x) < (prob * length(matrix_list)) / 2] <- 0
sign(x)
}
Example cases... 示例案例...
prob_matrix(listX, .2)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1 1 0 1 0
#> [2,] -1 0 -1 -1 0
#> [3,] 1 -1 1 1 1
#> [4,] 0 -1 1 1 -1
#> [5,] -1 0 -1 0 -1
prob_matrix(listX, .4)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1 1 0 1 0
#> [2,] -1 0 -1 -1 0
#> [3,] 1 -1 1 1 1
#> [4,] 0 -1 1 1 -1
#> [5,] -1 0 -1 0 -1
prob_matrix(listX, .6)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 1 0
#> [2,] -1 0 0 -1 0
#> [3,] 1 -1 0 1 1
#> [4,] 0 0 0 1 -1
#> [5,] -1 0 0 0 -1
prob_matrix(listX, .8)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 1 0
#> [2,] -1 0 0 -1 0
#> [3,] 1 -1 0 1 1
#> [4,] 0 0 0 1 -1
#> [5,] -1 0 0 0 -1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.