简体   繁体   中英

Iterating conditional sums in R

I have a series of two-dimensional numerical matrices comprising 1s and 0s. (So I suppose they can also be seen as logical arrays.) What I want to be able to do for such arrays is to generate a vector the length of one dimension of the array (the number of columns). It would contain, for every column in the array, the sum of row totals for the rows where the entry is 1.

Here's what I have for single columns:

#Generate sample data
dataset<-matrix(sample(0:1, size=190, replace=TRUE), nrow=19, ncol=10)
#Calculate row sums
scores<-rowSums(dataset)
#calculate desired statistic for column 1
M1_1 <- sum(scores[which (dataset[,1]==1)])
#calculate same statistic for column 2
M1_2 <- sum(scores[which (dataset[,2]==1)])

Obviously, instead of writing M1_1, M1_2, ..., M1_n, I want to define M1_X to iterate through every column. I suspect it's a really simple thing to do, but haven't been able to figure out how to do it. Any guidance would be appreciated.

We can loop with sapply and get the sum

as.vector(sapply(split(dataset, col(dataset)), function(x) sum(scores[x==1])))
#[1] 56 47 50 53 55 48 75 67 40 55

Or using apply

apply(dataset, 2, function(x) sum(scores[x==1]))
#[1] 56 47 50 53 55 48 75 67 40 55

Or a vectorized approach would be to replicate the 'scores' and multiply it with 'dataset' without making use of any recycling (which can be dangerous at times)

colSums(scores[row(dataset)]*dataset)
#[1] 56 47 50 53 55 48 75 67 40 55

Or another intuitive option is sweep

colSums(sweep(dataset, 1, scores, FUN = "*"))
#[1] 56 47 50 53 55 48 75 67 40 55

Based on OP's post,

M1_1
#[1] 56
M1_2
#[1] 47

Or as @user20650 commented, a concise option is crossprod

crossprod(scores, dataset)

Or without even calculating 'scores' in a different step

rowSums(crossprod(dataset))
#[1] 56 47 50 53 55 48 75 67 40 55

We can just multiply the matrix of 0's and 1's with the corresponding scores and then get the sum column-wise

colSums(dataset * scores)

#[1] 44 58 50 53 42 60 43 46 55 45

Matrix multiplication will also work (reproducible with seed 123):

as.numeric(matrix(scores, nrow=1) %*% dataset)
# [1] 53 72 16 51 43 49 51 49 30 66

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM