基於組/類別執行多個配對 t 檢驗

Question

我堅持在 Rstudio 中為多個類別執行 t.tests。 我想得到每種產品類型的 t.test 的結果，比較線上和線下的價格。 我有超過 800 種產品類型，這就是為什么不想為每個產品組手動操作的原因。

我有一個名為 data 的數據框（超過 200 萬行），如下所示：

> Product_type   Price_Online   Price_Offline   
1   A            48             37
2   B            29             22
3   B            32             40
4   A            38             36
5   C            32             27
6   C            31             35
7   C            28             24
8   A            47             42
9   C            40             36

理想情況下，我希望 R 將 t.test 的結果寫入另一個名為 product_types 的數據框：

    > Product_type   
    1   A           
    2   B            
    3   C          
    4   D          
    5   E         
    6   F            
    7   G            
    8   H            
    9   I            
   800 ...

變成：

> Product_type   t         df       p-value   interval    mean of difference            
    1   A           
    2   B            
    3   C          
    4   D          
    5   E         
    6   F            
    7   G            
    8   H            
    9   I            
   800 ...

如果我在不同的數據框中擁有所有產品類型，這就是公式：

t.test(Product_A$Price_Online, Product_A$Price_Offline, mu=0, alt="two.sided", paired = TRUE, conf.level = 0.99)

必須有一種更簡單的方法來做到這一點。 否則我需要制作 800+ 個數據幀，然后執行 t 檢驗 800 次。

我嘗試了列表和 lapply 的東西，但到目前為止它不起作用。 我還在多個列上嘗試了 t-Test： https ://sebastiansauer.github.io/multiple-t-tests-with-dplyr/

但是，最后他仍然手動插入男性和女性（對我來說超過 800 個類別）。

Answer 1

這樣做的整潔方法是使用 dplyr 和 broom：

library(dplyr)
library(broom)

df <- data %>% 
  group_by(Product_type) %>% 
  do(tidy(t.test(.$Price_Online, 
                 .$Price_Offline, 
                 mu = 0, 
                 alt = "two.sided", 
                 paired = TRUE, 
                 conf.level = 0.99))))

Answer 2

一種方法是使用by ：

result <- by(data, data$Product_type, function(x) 
  t.test(x$Price_Online, x$Price_Offline, mu=0, alt="two.sided", 
         paired=TRUE, conf.level=0.99)[c(1:9)])

要在數據框中獲取結果，您必須rbind它：

type.convert(as.data.frame(do.call(rbind, result)), as.is=TRUE)
#     statistic parameter   p.value             conf.int estimate null.value   stderr alternative        method
# A    2.267787         2 0.1514719  -20.25867, 32.25867        6          0 2.645751   two.sided Paired t-test
# B -0.06666667         1 0.9576214  -477.9256, 476.9256     -0.5          0      7.5   two.sided Paired t-test
# C    1.073154         3 0.3618456 -9.996192, 14.496192     2.25          0 2.096624   two.sided Paired t-test

或者，使用管道：

do.call(rbind, result) |> as.data.frame() |> type.convert(as.is=TRUE)

數據

data <- structure(list(Product_type = c("A", "B", "B", "A", "C", "C", 
"C", "A", "C"), Price_Online = c(48L, 29L, 32L, 38L, 32L, 31L, 
28L, 47L, 40L), Price_Offline = c(37L, 22L, 40L, 36L, 27L, 35L, 
24L, 42L, 36L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9"))

基於組/類別執行多個配對 t 檢驗

問題描述

2 個解決方案

解決方案1
23 2017-03-05 15:34:37

解決方案2
5 已采納 2017-03-05 15:19:55

數據

基於組/類別執行多個配對 t 檢驗

問題描述

2 個解決方案

解決方案1 23 2017-03-05 15:34:37

解決方案2 5 已采納 2017-03-05 15:19:55

數據

解決方案1
23 2017-03-05 15:34:37

解決方案2
5 已采納 2017-03-05 15:19:55