简体   繁体   English

使用 dplyr 计算与组平均值的差异

[英]Calculate difference from the group mean using dplyr

I want to calculate the difference each row has from its group's mean.我想计算每行与其组平均值的差异。 Is there a way to do this without creating an intermediate table and joining it?有没有办法在不创建中间表并加入它的情况下做到这一点?

group_summary <- mtcars %>%
  group_by(cyl) %>%
  summarize(mean_mpg = mean(mpg))

left_join(mtcars, group_summary) %>%
  mutate(mpg_diff_from_group = mpg - mean_mpg)

Yes, the following works without intermediate table:是的,以下工作没有中间表:

mtcars %>%
    group_by(cyl) %>%
    mutate(grouped_diff = mpg - mean(mpg)) %>%
    ungroup()
mtcars  %>% group_by(cyl)   %>% mutate(mean_mpg = mean(mpg), mpg_diff_from_grp= mpg - mean_mpg)  %>% ungroup() 

similar to the previous codes, instead of using mutate you can also summarize the data and then ungroup again与前面的代码类似,您也可以对数据进行汇总,然后再次取消分组,而不是使用 mutate

mtcars %>%
  group_by(cyl) %>%
  summarise(grouped_diff = mpg - mean(mpg)) %>%
  ungroup

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM