简体   繁体   中英

Use column name as weight to calculate weighted mean in R

Continue from my previous (answered) question .

Say I have this data,

> df
  rank1 rank2 rank3 rank4 rank5
1     A     B     C     D     E
2     A     C     B     D     E
3     C     A     B     E     D
4     B     A     C     D     E
5     A     B     D     C     E

I managed to create a frequency table of ranking by item (thanks to akrun ),

> df.frequency
     ranking
items 1 2 3 4 5
    A 3 2 0 0 0
    B 1 2 2 0 0
    C 1 1 2 1 0
    D 0 0 1 3 1
    E 0 0 0 1 4

> str(df.frequency)
 'table' int [1:5, 1:5] 3 1 1 0 0 2 2 1 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ items  : chr [1:5] "A" "B" "C" "D" ...
  ..$ ranking: chr [1:5] "1" "2" "3" "4" ...

In Excel, I use =SUMPRODUCT($B$1:$F$1,B2:F2)/SUM(B2:F2) to get the weighted mean,

    1   2   3   4   5   Mean
A   3   2   0   0   0   1.4
B   1   2   2   0   0   2.2
C   1   1   2   1   0   2.6
D   0   0   1   3   1   4
E   0   0   0   1   4   4.8

In R, How to I calculate the weighted mean of each item where the weight is the rank? I want to calculate SD and median as well.

Are you looking for something simple like this:

> a<-1:dim(df)[1] ### colnames
> z<-0
> b<-apply(df,1,function(x) x/sum(x)) ### ratio
> for(i in 1:dim(df)[1]){
+   z[i]<-sum(a*b[i,]) ### column weighted ratio
+ }
> z
[1] 1.4 2.2 2.6 4.0 4.8

If you want to add it to the column just cboi

> cbind(x,z)
  1 2 3 4 5   z
1 3 1 1 0 0 1.4
2 2 2 1 0 0 2.2
3 0 2 2 1 0 2.6
4 0 0 1 3 1 4.0
5 0 0 0 1 4 4.8

Inspired by @TonyHellmuth's solution, this can be also solved by

cbind(tbl, z= c(seq_len(dim(tbl)[1])%*% t(tbl)/rowSums(tbl)))
#  1 2 3 4 5   z
#A 3 2 0 0 0 1.4
#B 1 2 2 0 0 2.2
#C 1 1 2 1 0 2.6
#D 0 0 1 3 1 4.0
#E 0 0 0 1 4 4.8

data

tbl <-  table(unlist(df), c(col(df)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM