简体   繁体   English

如何在R中使用mutate_at计算加权平均值?

[英]How to calculate weighted mean using mutate_at in R?

I have a dataframe ("df") with a number of columns that I would like to estimate the weighted means of, weighting by population (df$Population), and grouping by commuting zone (df$cz).我有一个包含许多列的数据框(“df”),我想估计这些列的加权平均值、按人口加权(df$Population)和按通勤区分组(df$cz)。

This is the list of columns I would like to estimate the weighted means of:这是我想估计加权平均值的列列表:

 vlist = c("Public_Welf_Total_Exp", "Welf_Cash_Total_Exp", "Welf_Cash_Cash_Assist", "Welf_Ins_Total_Exp","Total_Educ_Direct_Exp", "Higher_Ed_Total_Exp", "Welf_NEC_Cap_Outlay","Welf_NEC_Direct_Expend", "Welf_NEC_Total_Expend", "Total_Educ_Assist___Sub", "Health_Total_Expend", "Total_Hospital_Total_Exp", "Welf_Vend_Pmts_Medical","Hosp_Other_Total_Exp","Unemp_Comp_Total_Exp", "Unemp_Comp_Cash___Sec", "Total_Unemp_Rev", "Hous___Com_Total_Exp", "Hous___Com_Construct")

This is the code I have been using:这是我一直在使用的代码:

 df = df %>% group_by(cz) %>% mutate_at(vlist, weighted.mean(., df$Population))

I have also tried:我也试过:

 df = df %>% group_by(cz) %>% mutate_at(vlist, function(x) weighted.mean(x, df$Population)) 

As well as tested the following code on only 2 columns:以及仅在 2 列上测试了以下代码:

 df = df %>% group_by(cz) %>% mutate_at(vars(Public_Welf_Total_Exp, Welf_Cash_Total_Exp), weighted.mean(., df$Population)) 

However, everything I have tried gives me the following error, even though there are no NAs in any of my variables:但是,即使我的任何变量中都没有 NA,我所尝试的一切都给了我以下错误:

 Error in weighted.mean.default(., df$Population) : 
   'x' and 'w' must have the same length

I understand that I could do the following estimation using lapply, but I don't know how to group by another variable using lapply.我知道我可以使用 lapply 进行以下估计,但我不知道如何使用 lapply 按另一个变量分组。 I would appreciate any suggestions!我将不胜感激任何建议!

There is a lot to unpack here...这里有很多东西要解开......

  1. Probably you mean summarise instead of mutate , because with mutate you would just replicate your result for each row.可能您的意思是summarise而不是mutate ,因为使用mutate您只需复制每一行的结果。
  2. mutate_at and summarise_at are subseeded and you should use across instead. mutate_atsummarise_at被subseeded,你应该使用across代替。
  3. the reason why your code wasn't working was because you did not write your function as a formula (you did not add ~ at the beginning), also you were using df$Population instead of Population .您的代码不起作用的原因是因为您没有将函数编写为公式(您没有在开头添加~ ),而且您使用的是df$Population而不是Population When you write Population , summarise knows you're talking about the column Population which, at that point, is grouped like the rest of the dataframe.当您编写Populationsummarise知道您正在谈论的是Population列,此时该列与数据框的其余部分一样分组。 When you use df$Population you are calling the column of the original dataframe without grouping.当您使用df$Population您是在不分组的情况下调用原始数据框的列。 Not only it is wrong, but you would also get an error because the length of the variable you are trying to average and the lengths of the weights provided by df$Population would not correspond.这不仅是错误的,而且还会出错,因为您尝试平均的变量长度与df$Population提供的权重长度df$Population

Here is how you could do it:以下是您可以这样做的方法:

library(dplyr)

df %>%
   group_by(cz) %>% 
   summarise(across(vlist, weighted.mean, Population),
             .groups = "drop")

If you really need to use summarise_at (and probably you are using an old version of dplyr [lower than 1.0.0]), then you could do:如果您确实需要使用summarise_at (并且您可能正在使用旧版本的dplyr [低于 1.0.0]),那么您可以这样做:

df %>%
   group_by(cz) %>% 
   summarise_at(vlist, ~weighted.mean(., Population)) %>%
   ungroup()

I considered df and vlist like the following:我认为dfvlist如下所示:

vlist <- c("Public_Welf_Total_Exp", "Welf_Cash_Total_Exp", "Welf_Cash_Cash_Assist", "Welf_Ins_Total_Exp","Total_Educ_Direct_Exp", "Higher_Ed_Total_Exp", "Welf_NEC_Cap_Outlay","Welf_NEC_Direct_Expend", "Welf_NEC_Total_Expend", "Total_Educ_Assist___Sub", "Health_Total_Expend", "Total_Hospital_Total_Exp", "Welf_Vend_Pmts_Medical","Hosp_Other_Total_Exp","Unemp_Comp_Total_Exp", "Unemp_Comp_Cash___Sec", "Total_Unemp_Rev", "Hous___Com_Total_Exp", "Hous___Com_Construct")
df <- as.data.frame(matrix(rnorm(length(vlist) * 100), ncol = length(vlist)))
names(df) <- vlist
df$cz <- rep(letters[1:10], each = 10)
df$Population <- runif(100)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM