简体   繁体   English

R 中多组数据的正态性检验

[英]Normality test for multi-grouped data in R

I'm trying to run a normality test over my data in R.我正在尝试对 R 中的数据进行正态性测试。 My dataset is data frame formed by 4 columns of characters and one column with numeric values.我的数据集是由 4 列字符和 1 列数值组成的数据框。 At the moment, I'm using the Rstatix package in R and other type of statistical test are working well like wilcox_test() and kruskal_test() but when I try to run shapiro_test() it doesn't work giving the following error:目前,我在 R 中使用 Rstatix package 和其他类型的统计测试运行良好,如 wilcox_test() 和 kruskal_test() 但是当我尝试运行 shapiro_test() 它不起作用给出以下

data %>% group_by(treatment,chase,measure) %>% shapiro_test(value)
x
+-<error/dplyr:::mutate_error>
| Problem with `mutate()` input `data`.
| x Must group by variables found in `.data`.
| * Column `variable` is not found.
| i Input `data` is `map(.data$data, .f, ...)`.
\-<error/rlang_error>
  Must group by variables found in `.data`.
  * Column `variable` is not found.
Backtrace:
  1. dplyr::group_by(., treatment, chase, measure)
  2. rstatix::shapiro_test(., value)
 33. rstatix:::.f(.x[[i]], ...)
 11. dplyr::group_by(., variable)
 43. dplyr::group_by_prepare(.data, ..., .add = .add)

My data set is the following:我的数据集如下:

    groups treatment chase measure     value
1   uncoated   control    30  colocA 17.912954
2   uncoated   control    30  colocA 16.806409
3   uncoated   control    30  colocA 20.322467
4   uncoated   control    30  colocA 15.953959
5   uncoated   control    30  colocA 22.566408
6   uncoated   control    30  colocA 17.780975
7   uncoated   control    30  colocA 19.764265
8   uncoated   control    30  colocA 16.928500
9   uncoated   control    30  colocA 22.931763
10  uncoated   control    30  colocA 18.101085
11  uncoated   control    30  distCC  1.159298
12  uncoated   control    30  distCC  1.174931
13  uncoated   control    30  distCC  1.190449
14  uncoated   control    30  distCC  1.265717
15  uncoated   control    30  distCC  1.103845
16  uncoated   control    30  distCC  1.125344
17  uncoated   control    30  distCC  1.290703
18  uncoated   control    30  distCC  1.172462
19  uncoated   control    30  distCC  1.065353
20  uncoated   control    30  distCC  1.048523
21    coated   control    30  colocA  6.062000
22    coated   control    30  colocA  9.370714
23    coated   control    30  colocA 12.898769
24    coated   control    30  colocA 20.398458
25    coated   control    30  colocA 11.174150
26    coated   control    30  colocA 17.574250
27    coated   control    30  colocA 12.481857
28    coated   control    30  colocA 21.565250
29    coated   control    30  colocA 21.743409
30    coated   control    30  colocA 12.699600
31    coated   control    30  distCC  4.317260
32    coated   control    30  distCC  4.263914
33    coated   control    30  distCC  5.136013
34    coated   control    30  distCC  3.142906
35    coated   control    30  distCC  2.617590
36    coated   control    30  distCC  4.149614
37    coated   control    30  distCC  4.995551
38    coated   control    30  distCC  3.851803
39    coated   control    30  distCC  4.606119
40    coated   control    30  distCC  2.820326

Thank you in advance.先感谢您。

Here is a way with stats::shapiro.test .这是stats::shapiro.test的一种方式。

library(dplyr)
library(broom)

data %>%
  group_by(treatment, chase, measure) %>% 
  do(tidy(shapiro.test(.$value)))
## A tibble: 2 x 6
## Groups:   treatment, chase, measure [2]
#  treatment chase measure statistic p.value method                     
#  <chr>     <int> <chr>       <dbl>   <dbl> <chr>                      
#1 control      30 colocA      0.940 0.244   Shapiro-Wilk normality test
#2 control      30 distCC      0.811 0.00128 Shapiro-Wilk normality test

We could also wrap the output in a list in summarise and unnest it我们还可以将unnest包装在summarise list中并取消嵌套

library(dplyr)
library(tidyr)
library(broom)
dat %>% 
    group_by(treatment, chase, measure) %>%
    summarise(out = list(shapiro.test(value) %>% tidy), .groups = 'drop') %>%
    unnest(c(out))
# A tibble: 2 x 6
#  treatment chase measure statistic p.value method                     
#  <chr>     <int> <chr>       <dbl>   <dbl> <chr>                      
#1 control      30 colocA      0.940 0.244   Shapiro-Wilk normality test
#2 control      30 distCC      0.811 0.00128 Shapiro-Wilk normality test
 

data数据

dat <- structure(list(groups = c("uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "coated", 
"coated", "coated", "coated", "coated", "coated", "coated", "coated", 
"coated", "coated", "coated", "coated", "coated", "coated", "coated", 
"coated", "coated", "coated", "coated", "coated"), treatment = c("control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control"), chase = c(30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), measure = c("colocA", 
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA", 
"colocA", "colocA", "distCC", "distCC", "distCC", "distCC", "distCC", 
"distCC", "distCC", "distCC", "distCC", "distCC", "colocA", "colocA", 
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA", 
"colocA", "distCC", "distCC", "distCC", "distCC", "distCC", "distCC", 
"distCC", "distCC", "distCC", "distCC"), value = c(17.912954, 
16.806409, 20.322467, 15.953959, 22.566408, 17.780975, 19.764265, 
16.9285, 22.931763, 18.101085, 1.159298, 1.174931, 1.190449, 
1.265717, 1.103845, 1.125344, 1.290703, 1.172462, 1.065353, 1.048523, 
6.062, 9.370714, 12.898769, 20.398458, 11.17415, 17.57425, 12.481857, 
21.56525, 21.743409, 12.6996, 4.31726, 4.263914, 5.136013, 3.142906, 
2.61759, 4.149614, 4.995551, 3.851803, 4.606119, 2.820326)),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM