简体   繁体   中英

Calculate weighted means for multiple grouping with different weightings in R

I've gone through the many posts on SO trying to get my code to work but still have some errors. I'm trying to calculate weighted means for many columns based on different groupings. Specifically, I want to calculate the weighted mean of traits (in this case wingL, wingW, etc.) weighted by the value column.

Here is a sample dataset (because my matrix is HUGE) and some code:

>df
    year site  Species value  wingL  wingW   proL proW 
    2018    2    Aa      3.0   310.6  54.9   NA   1.1       
    2017    2    Aa      1.0   310.6  54.9   NA   1.1 
    2018    2    Bb      7.5    NA    20     3    1.0    
    2017    2    Bb      5      NA    20     3    1.0
    2018    4    Aa      8     310.6  54.9   NA   1.1       
    2017    4    Aa      6     310.6  54.9   NA   1.1
    2018    4    Cc      1    161.20   143.8  NA   NA 
    2017    4    Cc      1    161.20   143.8  NA   NA
    2018    6    Aa      12    310.6   54.9   NA   1.1  
    2018    6    Aa      9.5   310.6   54.9   NA   1.1
    2018    6    Cc      7    161.20   143.8  NA   NA 
    2017    6    Cc      7    161.20   143.8  NA   NA

Here is my code:

dfnew <- setDT(df)[, lapply(.SD, function(x) weighted.mean(x, value)),
                       by = c("year", "Species"), .SDcols  = wingL:proW]

But all it does it delete the "value" column which is what I want to use as my weights. Basically, I want to calculate the weighted mean across rows for columns wingL:proW. Then, once I have those data I eventually will average across all species (Aa, Bb) at each site.

With code below I was able to correctly create a new df with just one new column (for wingL_wm) but can't figure out how to scale this for the many columns I have::

dfnew <- df %>% 
          group_by(year, site) %>%
          summarise(wingL_wm = weighted.mean(wingL, value))

Hope that makes sense. Thanks for the help Here is a generic desired output though the "x" should be the calculated weighted means:

year site   wingL_WM  wingW_WM   proL_WM proW_WM
2018    2       x        x         x        x       
2017    2       x        x         x        x
2018    4       x        x         x        x
2017    4       x        x         x        x
2018    6       x        x         x        x    
2017    6       x        x         x        x
dfnew <- setDT(df)[, lapply(.SD, function(x) weighted.mean(x, value, na.rm = TRUE)), by = c("year", "site"), .SDcols = wingL:proW]

I had to include the na.rm statement! I think this gives the correct results. Thanks everyone for helping me think it through as I did have errors by grouping - was over thinking it.

It does replace the original values, but I can live with that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM