[英]group_by and summarize() multiple things in R using dplyr/tidyverse
I am trying to find the country with the highest average age but I also need to filter out countries with less than 5 entries in the data frame. 我正在尝试找到平均年龄最高的国家,但我还需要过滤掉数据框中条目少于5个的国家。 I tried the following but it does not work: 我尝试了以下操作,但不起作用:
bil %>%
group_by(citizenship,age) %>%
mutate(n=count(citizenship), theMean=mean(age,na.rm=T)) %>%
filter(n>=5) %>%
arrange(desc(theMean))
bil is the dataset and I am trying to count how many entries I have for each country, filter out countries with less than 5 entries, find the average age for each country and then find the country with the highest average. bil是数据集,我试图计算每个国家/地区有多少条目,过滤出条目少于5个的国家/地区,找到每个国家/地区的平均年龄,然后找到平均值最高的国家/地区。 I am confused on how to do both things at the same time. 我对如何同时做两件事感到困惑。 If I do one summarize at a time I lose the rest of my data. 如果我一次做一个汇总,我将丢失其余数据。
Perhaps, this could help. 也许,这可能会有所帮助。 Note that the parameter 'x' in count
is a tbl/data.frame
. 请注意, count
中的参数“ x”是tbl/data.frame
。 So, instead of count
, we group by 'citizenship' and get the frequency of values with n()
, get the mean
of 'age' (not sure about the 'age' as grouping variable) and do the filter
因此,我们不使用count
而是按“公民身份”进行分组,并使用n()
获得值的频率,获取“ age”的mean
(不确定“ age”是否为分组变量)并进行filter
bil %>%
group_by(citizenship) %>%
mutate(n = n()) %>%
mutate(theMean = mean(age, na.rm=TRUE)) %>%
filter(n>=5) %>%
arrange(desc(theMean))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.