[英]How do I add a column that is the proportion of a factor level
I'm working with UK data . 我正在使用英国数据 。 I want to add a column that is the proportion of factor level
genhealth == 1
by agelfted
. 我想添加一列,它是
genhealth == 1
的因子水平genhealth == 1
的agelfted
。
So far, my idea is to create a separate column that is the number of genhealth == 1
by agelfted
and then to create an additional column that is the number of obs by agelfted
that genhealth != 1
and then to simply divide the first created column by the second. 到目前为止,我的想法是创建一个单独的列,将
genhealth == 1
乘以agelfted
,然后再创建一个附加列,这是agelfted
genhealth != 1
的obs数目,然后简单地将第一个创建的列除以列第二。 I'm really not makeing it very far with this strategy: 我真的没有在这个策略上走得太远:
oreo$gh<-aggregate(oreo, by=c(subset("genhealth"==1),"agelifted"), FUN = "sum")
Error: unexpected symbol in "oreo$gh<-aggregate(oreo, by=c(subset("genhealth"==1),"agelifted") FUN"
A dplyr
approach: dplyr
方法:
library(dplyr)
oreo %>%
group_by(agelfted) %>% # for every agelfted value
mutate(Prop = sum(genhealth == 1)/n()) %>% # get number of times genhealth = 1 dived by number of total rows
ungroup() # forget the grouping
# # A tibble: 102,816 x 8
# sex yobirth year genhealth longst_illness age agelfted Prop
# <fct> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
# 1 female 47.0 84.0 1.00 2.00 37 15.0 0.552
# 2 male 24.0 84.0 3.00 1.00 60 14.0 0.456
# 3 female 31.0 84.0 1.00 1.00 53 35.0 0.705
# 4 male 29.0 84.0 1.00 2.00 55 14.0 0.456
# 5 male 39.0 84.0 1.00 2.00 45 18.0 0.706
# 6 female 35.0 84.0 1.00 1.00 49 15.0 0.552
# 7 male 42.0 84.0 1.00 2.00 42 15.0 0.552
# 8 female 43.0 84.0 1.00 1.00 41 16.0 0.646
# 9 male 49.0 84.0 3.00 1.00 35 15.0 0.552
#10 female 40.0 84.0 2.00 2.00 44 15.0 0.552
# # ... with 102,806 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.