简体   繁体   中英

How can I summarize a column that shows the mean difference in R?

tableData <- tibble(Fruits = sample(c('Apple', 'Banana', 'Orange'), 30, T),
                        Ripeness = sample(c('yes', 'no'), 30, T),
                        Mean = ifelse(Ripeness == 'yes', 1.4 + runif(30), 1.6 + runif(30))) %>% 
 add_row(Fruits = "Peach", Ripeness = "yes", Mean = 5)

get_t_test_pval <- function(formula){
  tryCatch({t.test(formula)$p.value}, error = function(cond) NA)
}

tableData %>% 
  group_by(Fruits) %>% 
  summarise(t_test_pval = get_t_test_pval(Mean ~ Ripeness))

The following code summarizes the table so that p-values are evaluated for each fruit. Is it possible to also add a column that shows the mean difference (ie mean of yes ripeness - mean of no ripeness) to each fruit? With a trycatch enabled as well?

One option is after grouping by 'Fruits' summarise by the taking the difference of mean of 'Mean' where 'Ripeness' is 'yes' with that of 'no', while applying the OP's function to get the 't_test_pval'

library(dplyr)
tableData %>% 
  group_by(Fruits) %>%    
  summarise(Meandiff = mean(Mean[Ripeness == 'yes'])- 
        mean(Mean[Ripeness == 'no']), 
       t_test_pval = get_t_test_pval(Mean ~ Ripeness))
# A tibble: 4 x 3
#  Fruits Meandiff t_test_pval
#  <chr>     <dbl>       <dbl>
#1 Apple     0.122       0.435
#2 Banana    0.167       0.327
#3 Orange   -0.306       0.216
#4 Peach   NaN          NA    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM