[英]How to calculate weighted mean using mutate_at in R?
I have a dataframe ("df") with a number of columns that I would like to estimate the weighted means of, weighting by population (df$Population), and grouping by commuting zone (df$cz).我有一个包含许多列的数据框(“df”),我想估计这些列的加权平均值、按人口加权(df$Population)和按通勤区分组(df$cz)。
This is the list of columns I would like to estimate the weighted means of:这是我想估计加权平均值的列列表:
vlist = c("Public_Welf_Total_Exp", "Welf_Cash_Total_Exp", "Welf_Cash_Cash_Assist", "Welf_Ins_Total_Exp","Total_Educ_Direct_Exp", "Higher_Ed_Total_Exp", "Welf_NEC_Cap_Outlay","Welf_NEC_Direct_Expend", "Welf_NEC_Total_Expend", "Total_Educ_Assist___Sub", "Health_Total_Expend", "Total_Hospital_Total_Exp", "Welf_Vend_Pmts_Medical","Hosp_Other_Total_Exp","Unemp_Comp_Total_Exp", "Unemp_Comp_Cash___Sec", "Total_Unemp_Rev", "Hous___Com_Total_Exp", "Hous___Com_Construct")
This is the code I have been using:这是我一直在使用的代码:
df = df %>% group_by(cz) %>% mutate_at(vlist, weighted.mean(., df$Population))
I have also tried:我也试过:
df = df %>% group_by(cz) %>% mutate_at(vlist, function(x) weighted.mean(x, df$Population))
As well as tested the following code on only 2 columns:以及仅在 2 列上测试了以下代码:
df = df %>% group_by(cz) %>% mutate_at(vars(Public_Welf_Total_Exp, Welf_Cash_Total_Exp), weighted.mean(., df$Population))
However, everything I have tried gives me the following error, even though there are no NAs in any of my variables:但是,即使我的任何变量中都没有 NA,我所尝试的一切都给了我以下错误:
Error in weighted.mean.default(., df$Population) :
'x' and 'w' must have the same length
I understand that I could do the following estimation using lapply, but I don't know how to group by another variable using lapply.我知道我可以使用 lapply 进行以下估计,但我不知道如何使用 lapply 按另一个变量分组。 I would appreciate any suggestions!
我将不胜感激任何建议!
There is a lot to unpack here...这里有很多东西要解开......
summarise
instead of mutate
, because with mutate
you would just replicate your result for each row.summarise
而不是mutate
,因为使用mutate
您只需复制每一行的结果。mutate_at
and summarise_at
are subseeded and you should use across
instead. mutate_at
和summarise_at
被subseeded,你应该使用across
代替。~
at the beginning), also you were using df$Population
instead of Population
.~
),而且您使用的是df$Population
而不是Population
。 When you write Population
, summarise
knows you're talking about the column Population
which, at that point, is grouped like the rest of the dataframe.Population
, summarise
知道您正在谈论的是Population
列,此时该列与数据框的其余部分一样分组。 When you use df$Population
you are calling the column of the original dataframe without grouping.df$Population
您是在不分组的情况下调用原始数据框的列。 Not only it is wrong, but you would also get an error because the length of the variable you are trying to average and the lengths of the weights provided by df$Population
would not correspond.df$Population
提供的权重长度df$Population
。 Here is how you could do it:以下是您可以这样做的方法:
library(dplyr)
df %>%
group_by(cz) %>%
summarise(across(vlist, weighted.mean, Population),
.groups = "drop")
If you really need to use summarise_at
(and probably you are using an old version of dplyr
[lower than 1.0.0]), then you could do:如果您确实需要使用
summarise_at
(并且您可能正在使用旧版本的dplyr
[低于 1.0.0]),那么您可以这样做:
df %>%
group_by(cz) %>%
summarise_at(vlist, ~weighted.mean(., Population)) %>%
ungroup()
I considered df
and vlist
like the following:我认为
df
和vlist
如下所示:
vlist <- c("Public_Welf_Total_Exp", "Welf_Cash_Total_Exp", "Welf_Cash_Cash_Assist", "Welf_Ins_Total_Exp","Total_Educ_Direct_Exp", "Higher_Ed_Total_Exp", "Welf_NEC_Cap_Outlay","Welf_NEC_Direct_Expend", "Welf_NEC_Total_Expend", "Total_Educ_Assist___Sub", "Health_Total_Expend", "Total_Hospital_Total_Exp", "Welf_Vend_Pmts_Medical","Hosp_Other_Total_Exp","Unemp_Comp_Total_Exp", "Unemp_Comp_Cash___Sec", "Total_Unemp_Rev", "Hous___Com_Total_Exp", "Hous___Com_Construct")
df <- as.data.frame(matrix(rnorm(length(vlist) * 100), ncol = length(vlist)))
names(df) <- vlist
df$cz <- rep(letters[1:10], each = 10)
df$Population <- runif(100)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.