繪制R中數據框中每個數值的平均值和標准差

Question

我想用平均值繪制每個數字列作為條形，標准偏差是一條穿過條形的線。 如何為iris數據集執行此操作？

我正在嘗試轉換我的數據集以使其易於在 ggplot2 中繪圖。

我試過的

iris %>%
  dplyr::select_if(is.numeric) %>%
  dplyr::summarise(avg_sepal_length = mean(Sepal.Length),
                  avg_sepal_width = mean(Sepal.Width),
                  avg_petal_length = mean(Petal.Length),
                  avg_petal_width = mean(Petal.Width),
                  sd_sepal_length = sd(Sepal.Length),
                  sd_sepal_width = sd(Sepal.Width),
                  sd_petal_length = sd(Petal.Length),
                  sd_petal_width = sd(Petal.Width))

我想旋轉成兩列，所以數據框看起來像這樣：

stat            mean            sd
sepal_length    5.843333        0.8280661        
sepal_width     3.057333        0.4358663
petal_length    3.758           1.765298    
pedal_width     1.199333        0.7622377

然后將上限和下限繪制為 sd 和 the 的一條線。 意思是 ggplot 中的條形圖

Answer 1

您的輸出格式不是ggplot2的最佳格式，它更喜歡它：


library(tidyr); library(dplyr)

iris %>%
  summarise(
        across(
            where(is.double), 
            list(mean = mean, sd = sd)
        )
    )  |>
    pivot_longer(
        everything(), 
        names_sep = "_", 
        names_to = c("feature", "stat")
    )  


# A tibble: 8 x 3
#   feature      stat  value
#   <chr>        <chr> <dbl>
# 1 Sepal.Length mean  5.84
# 2 Sepal.Length sd    0.828
# 3 Sepal.Width  mean  3.06
# 4 Sepal.Width  sd    0.436
# 5 Petal.Length mean  3.76
# 6 Petal.Length sd    1.77
# 7 Petal.Width  mean  1.20
# 8 Petal.Width  sd    0.762

由於您熟悉iris數據集，因此值得查看大量使用它across文檔。

要獲得您的格式，您可以將以下內容添加到管道中：

|>
    pivot_wider(names_from = "stat")

# # A tibble: 4 x 3
#   feature       mean    sd
#   <chr>        <dbl> <dbl>
# 1 Sepal.Length  5.84 0.828
# 2 Sepal.Width   3.06 0.436
# 3 Petal.Length  3.76 1.77 
# 4 Petal.Width   1.20 0.762

Answer 2

為了達到您想要的結果，您可以首先使用dplyr::across簡化您的代碼。 之后，您可以通過pivot_longer轉換為 long ，從而使用.value允許將mean s 和sd s 放在它們自己的列中。 最后，您可以將繪圖作為geom_col和geom_pointrange的組合：

library(dplyr)
library(tidyr)
library(ggplot2)

iris_sum <- iris %>%
  summarise(across(where(is.numeric), .fns = list(avg = mean, sd = sd), .names = "{.fn}_{.col}")) |> 
  pivot_longer(everything(), names_to = c(".value", "name"), names_sep = "_") |> 
  mutate(name = gsub("\\.", '_', tolower(name)))

iris_sum
#> # A tibble: 4 × 3
#>   name           avg    sd
#>   <chr>        <dbl> <dbl>
#> 1 sepal_length  5.84 0.828
#> 2 sepal_width   3.06 0.436
#> 3 petal_length  3.76 1.77 
#> 4 petal_width   1.20 0.762

ggplot(iris_sum, aes(name, avg)) +
  geom_col() +
  geom_pointrange(aes(ymin = avg - sd, ymax = avg + sd))

Answer 3

你可以簡單地嘗試

iris %>%
  dplyr::select_if(is.numeric) %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(name, value)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult = 1))

Answer 4

請注意，您實際上不需要預處理 df 來計算匯總值，您可以直接使用 ggplot2 的stat_summary ：

library(ggplot2)

ggplot(stack(iris), aes(x = ind, y = values)) + 
  stat_summary(geom = "bar", fun = mean) + 
  stat_summary(
    fun = mean, 
    fun.min = function(x) mean(x) - sd(x), 
    fun.max = function(x) mean(x) + sd(x))

在這里，我使用了 base R 的簡單stack函數來制作虹膜數據集的長版本； 您可以使用您喜歡的任何庫（特別是如果您想包含其他操作）。

繪制R中數據框中每個數值的平均值和標准差

問題描述

4 個解決方案

解決方案1
0 2022-07-05 14:33:00

解決方案2
0 已采納 2022-07-05 14:35:08

解決方案3
0 2022-07-05 15:00:29

解決方案4
0 2022-07-05 15:01:02

繪制R中數據框中每個數值的平均值和標准差

問題描述

4 個解決方案

解決方案1 0 2022-07-05 14:33:00

解決方案2 0 已采納 2022-07-05 14:35:08

解決方案3 0 2022-07-05 15:00:29

解決方案4 0 2022-07-05 15:01:02

解決方案1
0 2022-07-05 14:33:00

解決方案2
0 已采納 2022-07-05 14:35:08

解決方案3
0 2022-07-05 15:00:29

解決方案4
0 2022-07-05 15:01:02