简体   繁体   中英

create a X % probability matrix from list of matrices

I have a list of matrices (some hundred thousands). I want to create a single matrix where the cells correspond to eg the 95%. With that I mean this: if eg cell mat[1,2] is positive (ie >0) in 95% of the matrices it is scored a 1, and if eg cell mat[2,1] is negative (ie <0) in 95% of the matrices it is scored a -1. If they fall below this threshold they are scored a 0.

#Dummy data
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0.5, sd = 1),5,5)}
listX2 <- listX
for(i in 1:10) { listX2[[i]] <- ifelse(listX[[i]] >0, 1, -1) }

For the sake of the dummy data, the 95% can be changed to say 60%, such that the cells that keep their sign in 6 out of 10 matrices are kept and scored either 1 or -1 and the rest 0.

I'm stuck on this, hence cannot provide any more code.

I would do:

listX <- list()
set.seed(20)
# I set seed for reproducability, and changed your mean so you could see the negatives
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1),5,5)}

threshold <- 0.7
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0   -1    1
[2,]   -1    1   -1   -1    1
[3,]    0    0    0    1    1
[4,]    0    1    0    0    0
[5,]    0    0    0    0    0

This basically checks both conditions, and adds the two checks together. To break down one of the conditions (Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold)

lapply(listX,function(x){x > 0}) loops through each matrix and converts it to aa matrix of true/false for every value that is above zero.

Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) then adds these all together ( Reduce ), and divides by the number of obeservations. If the proportion is greater than our threshold, we set that value to one, and if not it is zero.

We then subtract the same matrix with x < 0 as the test, which gives -1 in each case where enough sub-values are negative.

You can change the list to an array and then take the mean over the dimensions.

arr <- simplify2array(listX)
grzero = rowMeans(arr > 0, dims = 2) 
lezero = rowMeans(arr < 0, dims = 2)  

prop = 0.6

1* (grzero >= prop) + -1* (lezero >= prop)

Test case showing which answers work so far! (edit)

Below you'll find my original answer. It ended up producing comparable results to the other answers on test cases involving randomly seeded data. To triple check, I created a small test data set with a known answer. It turns out that only answer by @Chris passes right now (though @user20650 should be ok if using >= on this example as indicated in comments). Here it is in case anyone else wants to use it:

listX <- list(
  matrix(c(1,0,-1,1), nrow = 2),
  matrix(c(1,0,-1,1), nrow = 2),
  matrix(c(1,0, 1,0), nrow = 2)
)

# With any threshold < .67,
# result should be...
matrix(c(1, 0, -1, 1), nrow = 2)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1

# Otherwise...
matrix(c(1, 0, 0, 0), nrow = 2)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    0

# @Chris answer passes
threshold <- 0.5
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1

threshold <- 1.0
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    0

# My function fails...
prob_matrix(listX, .5)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1
prob_matrix(listX,  1)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1

# @user20650 answer fails...
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 0.5
1* (grzero > prop) + -1* (lezero > prop)
#>      [,1] [,2]
#> [1,]    1   -1
#> [2,]    0    1

arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 1.0
1* (grzero > prop) + -1* (lezero > prop)
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0

Original answer

Here's one approach...

  • Combine sign and Reduce to do a cumulative sum of the signs of values in each cell, returning a single matrix.
  • Any cells where this value is less than the threshold number (your probability * number of matrices in the list) is converted to 0.
  • Return the sign() of all cells.

Below is an example with a wrapper function:

Toy data...

set.seed(12)
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1), 5, 5)}

Function...

prob_matrix <- function(matrix_list, prob) {
  # Sum the signs of values in each cell
  matrix_list <- lapply(matrix_list, sign)
  x <- Reduce(`+`, matrix_list)

  # Convert cells below prob to 0, others to relevant sign
  x[abs(x) < (prob * length(matrix_list)) / 2] <- 0
  sign(x)
}

Example cases...

prob_matrix(listX, .2)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   -1    1    0    1    0
#> [2,]   -1    0   -1   -1    0
#> [3,]    1   -1    1    1    1
#> [4,]    0   -1    1    1   -1
#> [5,]   -1    0   -1    0   -1

prob_matrix(listX, .4)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   -1    1    0    1    0
#> [2,]   -1    0   -1   -1    0
#> [3,]    1   -1    1    1    1
#> [4,]    0   -1    1    1   -1
#> [5,]   -1    0   -1    0   -1

prob_matrix(listX, .6)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    0    1    0
#> [2,]   -1    0    0   -1    0
#> [3,]    1   -1    0    1    1
#> [4,]    0    0    0    1   -1
#> [5,]   -1    0    0    0   -1

prob_matrix(listX, .8)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    0    1    0
#> [2,]   -1    0    0   -1    0
#> [3,]    1   -1    0    1    1
#> [4,]    0    0    0    1   -1
#> [5,]   -1    0    0    0   -1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM