简体   繁体   English

如何按组计算平均值和标准差,并将格式设置为 dataframe?

[英]How can I calculate mean and sd by group and format as dataframe?

My data is currently in the format of df1:我的数据目前是df1的格式:

outcome <- c("success", "failure", "success", "failure", "success", "failure")
basketball <- c(10, 7, 7, 8, 9, 10)
soccer <- c(8, 21, 30,  21, 6, 10)
football <- c(9,  2,  1, 3, 1, 5)

df1 <-  data.frame(outcome, basketball, soccer, football)

And I would like it to be in the format of df2, so I can more easily create a bar graph with ggplot2.我希望它采用 df2 格式,这样我可以更轻松地使用 ggplot2 创建条形图。

symptom <-  c("basketball",  "basketball", "soccer", "soccer", "football", "football")
mean <-  c(10, 6, 9, 7, 3, 1)
sd <-  c(1, 2, 1, 3, 0.5, 0.2)

df2 <- data.frame(outcome, symptom, mean, sd)

Currently I have a lot of code that can get me there in a roundabout way, but I feel like there must be a streamlined way to do this in a few lines of code.目前我有很多代码可以让我以迂回的方式到达那里,但我觉得必须有一种简化的方式在几行代码中做到这一点。 Is there a way to use this using dplyr or tidyr verbs?有没有办法使用 dplyr 或 tidyr 动词来使用它?

Thanks!谢谢!

We can reshape to 'long' format with pivot_longer and then do a group by operation我们可以使用pivot_longer重塑为“长”格式,然后按操作进行分组

library(dplyr)
library(tidyr)
df1 %>%
  pivot_longer(cols = basketball:football, names_to = 'symptom') %>% 
  group_by(outcome, symptom) %>%
  summarise(mean = mean(value), sd = sd(value), .groups = 'drop')

If we also need to plot如果我们还需要 plot

library(ggplot2)
df1 %>%
  pivot_longer(cols = basketball:football, names_to = 'symptom') %>% 
  group_by(outcome, symptom) %>% 
  summarise(mean = mean(value), sd = sd(value), .groups = 'drop') %>%
  ggplot(aes(x = outcome, y = mean, fill = symptom)) + 
    geom_bar(position = position_dodge(), stat = 'identity') + 
    geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
            width = .2, position = position_dodge(.9))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM