简体   繁体   中英

Aggregating one matrix by values in another matrix

As I am wont to do, I'm keeping tabs on my cats using matrices.

catWeights <- cbind(fluffy=c(5.0,5.1,5.2,5.3),misterCuddles=c(1.2,1.3,1.4,1.5),captainMew=c(4.3,4.2,4.1,4.0))
catTypes <- cbind(fluffy=c('cat','cat','cat','cat'),misterCuddles=c('kitten','kitten','kitten','cat'),captainMew=c('cat','cat','cat','cat'))
dates <- c("2013-01-01", "2013-01-02", "2013-01-03","2013-01-04")
row.names(catWeights) <- dates
row.names(catTypes) <- dates

On any date, I know how much each of them weigh:

> catWeights
           fluffy misterCuddles captainMew
2013-01-01    5.0           1.2        4.3
2013-01-02    5.1           1.3        4.2
2013-01-03    5.2           1.4        4.1
2013-01-04    5.3           1.5        4.0

And I know whether they're cats or kittens:

> catTypes
           fluffy misterCuddles captainMew
2013-01-01 "cat"  "kitten"      "cat"     
2013-01-02 "cat"  "kitten"      "cat"     
2013-01-03 "cat"  "kitten"      "cat"     
2013-01-04 "cat"  "cat"         "cat"  

How can I tell how much all my cats and all my kittens weigh through time?

I want this:

> totalWeights

             cat    kitten
2013-01-01   9.3       1.2
2013-01-02   9.3       1.3
2013-01-03   9.3       1.4
2013-01-04  10.8       0.0

On the fourth of January, Mister Cuddles turned 1, so he was no longer a kitten. His weight moved from the kitten bucket to the cat bucket.

This seems valid using the sample data:

do.call(cbind, 
      lapply(c("cat", "kitten"), 
            function(x) rowSums(catWeights * (catTypes == x))))
#           [,1] [,2]
#2013-01-01  9.3  1.2
#2013-01-02  9.3  1.3
#2013-01-03  9.3  1.4
#2013-01-04 10.8  0.0

EDIT:

As @BlueMagister commented ... lapply(unique(as.vector(catTypes)), ... is the more general form of the answer. I guess, though, you've already found a way to overcome this, since you accepted the answer. The as.vector is because unique has a matrix method that is not convenient in this specific case.

Also, since I 'm in editing mode, I'll note that sapply could 've been used, but based on some rough benchmarks I 've made from time to time, I 've found lapply to be faster even if it is accompanied by a do.call(r/cbind, ..) or a unlist . I did not test it for a larger dataset in this specific case, though.

So, another format of the answer could've been:

sapply(unique(as.vector(catTypes)), 
             function(x) rowSums(catWeights * (catTypes == x)))

Here is a not very general answer that applies only to the example data set.

# Construct matrices for the cat weights and kitten weights
catWts <- ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)
kittenWts <- ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)

# Well, then just take the row sums for the two matrices
catSums <- rowSums(catWts)
kittenSums <- rowSums(kittenWts)

# Then combine it to a data frame
totalWeights <- data.frame(cat=catSums, kitten=kittenSums)

# In one line
data.frame(cat=rowSums(ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)),
           kitten=rowSums(ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)))

#            cat kitten
#2013-01-01  9.0    1.3
#2013-01-02 10.1    1.4
#2013-01-03 10.3    1.2
#2013-01-04 14.6    0.0

I would imagine that there is a more general approach to solving this problem.

Microbenchmarking alexis_laz's two solutions on a 2500x2500 matrix with 10 groups:

> microbenchmark(cbindLapply(), sapplyOnly(), times=100)
Unit: milliseconds
          expr      min       lq   median       uq      max neval
 cbindLapply() 841.4796 865.2220 879.9099 892.6265 990.5915   100
  sapplyOnly() 846.3675 869.7372 879.0286 901.3314 979.6136   100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM