![](/img/trans.png)
[英]How to get the results of a mean and standard deviation to the same data frame by creating extra columns (mean and standard deviation) in R
[英]Creating a data frame with mean, standard deviation, standard error and confidence error
我試圖用這個數據框創建一個新的數據框,其中包含每個變量的均值、標准差(sd)、標准誤差(se)和置信區間(ci)。
pct.df <- structure(list(group = c("a", "a", "a", "b", "b", "b"), gender = c("male",
"female", "male", "female", "male", "female"), var_a = c(33.3333333333333,
16.6666666666667, 50, 50, 50, 33.3333333333333), var_b = c(50,
75, 50, 75, 75, 75), var_c = c(50, 75, 75, 100, 75, 75), var_d = c(50,
25, 0, 25, 50, 50), var_e = c(25, 0, 50, 0, 50, 25), var_f = c(25,
25, 0, 50, 50, 25), var_g = c(25, 25, 0, 50, 50, 25), var_h = c(25,
25, 0, 50, 50, 25), avg = c(35.4166666666667, 33.3333333333333,
28.125, 50, 56.25, 41.6666666666667)), class = "data.frame", row.names = c(NA,
-6L))
我想比較 A 組和 B 組每個變量的平均值(即 val_a 到 val_h 和 avg)。
我目前知道如何計算平均值。
sum.df <- pct.df %>%
group_by(group) %>%
summarise_if(is.numeric, mean) %>%
pivot_longer(cols = -group, names_to = "Variable")
但是,我也試圖在同一數據框中獲取每個 var_ 的 sd、se 和 ci。
我嘗試使用來自https://www.r-graph-gallery.com/4-barplot-with-error-bar.html 的類似內容來獲得我想要的東西。
my_sum <- data %>%
group_by(Species) %>%
summarise(
n=n(),
mean=mean(Sepal.Length),
sd=sd(Sepal.Length)
) %>%
mutate( se=sd/sqrt(n)) %>%
mutate( ic=se * qt((1-0.05)/2 + .5, n-1))
但我不能讓它工作(由於我不了解我應該如何使用多個變量來處理它)。 我是 R 的新手,我很感激我應該研究的任何建議或替代方法。
注意 - 理想情況下,輸出看起來像這樣?
group Variables mean sd se ci
1 a var_a 38 16 22 54
2 a var_b 69 24 45 93
3 a var_c 75 20 55 95
4 a var_d 44 12 32 56
5 a var_e 31 24 7 55
6 a var_f 38 14 24 52
7 a var_g 38 14 24 52
8 a var_h 38 14 24 52
9 a AVG 46 14 32 60
10 b var_a 58 29 29 87
11 b var_b 81 12 69 93
12 b var_c 88 14 74 102
13 b var_d 56 31 25 87
14 b var_e 56 31 25 87
15 b var_f 56 31 25 87
16 b var_g 56 31 25 87
17 b var_h 56 31 25 87
18 b AVG 64 25 39 89
根據您的評論和更新后的帖子,此解決方案將起作用:
library(tidyverse)
pct.df <- structure(list(group = c("a", "a", "a", "b", "b", "b"), gender = c("male",
"female", "male", "female", "male", "female"), var_a = c(33.3333333333333,
16.6666666666667, 50, 50, 50, 33.3333333333333), var_b = c(50,
75, 50, 75, 75, 75), var_c = c(50, 75, 75, 100, 75, 75), var_d = c(50,
25, 0, 25, 50, 50), var_e = c(25, 0, 50, 0, 50, 25), var_f = c(25,
25, 0, 50, 50, 25), var_g = c(25, 25, 0, 50, 50, 25), var_h = c(25,
25, 0, 50, 50, 25), avg = c(35.4166666666667, 33.3333333333333,
28.125, 50, 56.25, 41.6666666666667)), class = "data.frame", row.names = c(NA,
-6L))
pct.df %>%
pivot_longer(-c(group, gender, avg), names_to = "variable") %>%
group_by(group, variable) %>%
summarise(n = n(),
mean = mean(value),
sd = sd(value),
se = sd/sqrt(n),
ic = se * qt((1-0.05)/2 + .5, n-1)) %>%
select(-n) %>%
bind_rows(summarise(., across(everything(),
~if(is.numeric(.)) mean(.) else "AVG"))) %>%
arrange(group)
#> `summarise()` has grouped output by 'group'. You can override using the `.groups` argument.
#> # A tibble: 18 x 6
#> # Groups: group [2]
#> group variable mean sd se ic
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a var_a 33.3 16.7 9.62 41.4
#> 2 a var_b 58.3 14.4 8.33 35.9
#> 3 a var_c 66.7 14.4 8.33 35.9
#> 4 a var_d 25 25 14.4 62.1
#> 5 a var_e 25 25 14.4 62.1
#> 6 a var_f 16.7 14.4 8.33 35.9
#> 7 a var_g 16.7 14.4 8.33 35.9
#> 8 a var_h 16.7 14.4 8.33 35.9
#> 9 a AVG 32.3 17.4 10.0 43.1
#> 10 b var_a 44.4 9.62 5.56 23.9
#> 11 b var_b 75 0 0 0
#> 12 b var_c 83.3 14.4 8.33 35.9
#> 13 b var_d 41.7 14.4 8.33 35.9
#> 14 b var_e 25 25 14.4 62.1
#> 15 b var_f 41.7 14.4 8.33 35.9
#> 16 b var_g 41.7 14.4 8.33 35.9
#> 17 b var_h 41.7 14.4 8.33 35.9
#> 18 b AVG 49.3 13.3 7.71 33.2
這是一種類似的方法,無需旋轉數據。 此方法依賴於summarise
也可以返回summarise
向量的事實。
pct.df %>%
group_by(group) %>%
summarise(
val = c("mean", "sd", "n", "se", "ic"),
across(
where(is.numeric),
~c(mean(.x), sd(.x),length(.x), sd(.x) / length(.x),
sd(.x) / length(.x) * qt((1-0.05)/2 + .5, length(.x) - 1))
), .groups = "drop"
)
# A tibble: 10 x 11
group val var_a var_b var_c var_d var_e var_f var_g var_h avg
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a mean 33.3 58.3 66.7 25 25 16.7 16.7 16.7 32.3
2 a sd 16.7 14.4 14.4 25 25 14.4 14.4 14.4 3.76
3 a n 3 3 3 3 3 3 3 3 3
4 a se 5.56 4.81 4.81 8.33 8.33 4.81 4.81 4.81 1.25
5 a ic 23.9 20.7 20.7 35.9 35.9 20.7 20.7 20.7 5.39
6 b mean 44.4 75 83.3 41.7 25 41.7 41.7 41.7 49.3
7 b sd 9.62 0 14.4 14.4 25 14.4 14.4 14.4 7.32
8 b n 3 3 3 3 3 3 3 3 3
9 b se 3.21 0 4.81 4.81 8.33 4.81 4.81 4.81 2.44
10 b ic 13.8 0 20.7 20.7 35.9 20.7 20.7 20.7 10.5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.