R：帶子集的 T 統計量

Question

我想有一個表作為輸出，其中我有某些變量的均值差異和基於我的數據的兩個特定子集之間的 t 統計量。

我有以下數據：

structure(list(Name = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C", "D", "D", "D", "D"), Date = c("20.10.2018", 
"30.09.2018", "25.11.2019", "23.10.2020", "20.03.2018", "30.07.2018", 
"25.08.2019", "23.10.2020", "20.12.2018", "30.01.2018", "25.02.2019", 
"23.06.2020", "20.11.2018", "30.12.2018", "25.11.2019", "23.09.2020"
), Return = c(0.01, 0.05, 0.08, 0.07, 0.04, 0.03, 0.01, 0.03, 
0.03, 0.05, 0.06, 0.07, 0.07, 0.04, 0.06, 0.08), Age = c(5L, 
5L, 6L, 7L, 8L, 8L, 9L, 10L, 4L, 4L, 5L, 6L, 1L, 1L, 2L, 3L), 
    Size = c(53336L, 75768L, 86548L, 94567L, 40234L, 40240L, 
    50243L, 60352L, 5069L, 6069L, 7092L, 8024L, 2456L, 3046L, 
    4056L, 5600L), Rating = c(1L, 1L, 1L, 2L, 5L, 5L, 3L, NA, 
    4L, 5L, 4L, 5L, NA, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, 
-16L))

更具體地說，我想要一個表，其中我有 t 統計量，用於變量 Return、Age 和 Size 之間的平均值差異，用於評級為 1 和 5 的觀察值。t 統計量應該是 Rating 之間的列1 和評級 5，並且應包括表示 p 值的星號。

我嘗試使用 t.test function，但我很難將它僅用於子組，並在評級 1 和評級 5 之間的中間創建 t-statistics 列。

output 應具有如下布局：

structure(list(c("Return", "Age", "Size"), `Mean Rating 1` = c(NA, 
NA, NA), `t-statistics including p-value (indicated as stars)` = c(NA, 
NA, NA), `Mean Rating 5` = c(NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-3L))

有人可以幫我處理代碼嗎？

非常感謝您。

編輯 22.04.2022：

問題 1：如果我希望 output 如下所示（現在沒有值，只是為了說明我想要的布局），我需要如何調整答案中的代碼：

structure(list(c("Return", "Age", "Size"), `Mean Rating 1` = c(NA, 
NA, NA), `Mean Rating2` = c(NA, NA, NA), `Mean Rating 3` = c(NA, 
NA, NA), `Mean Rating 4` = c(NA, NA, NA), `Mean Rating 5` = c(NA, 
NA, NA), `Mean Rating NA` = c(NA, NA, NA), `Difference in means Rating 5 and Rating 1` = c(NA, 
NA, NA), `p-value for differences in means Rating 5 and Rating 1` = c(NA, 
NA, NA), `stars for p-value for differences in means Rating 5 and Rating 1` = c(NA, 
NA, NA)), class = "data.frame", row.names = c(NA, -3L))

問題2：當我想比較兩組的均值差異時，用t-test好還是F-test好？ 我選擇了 t 檢驗，因為據我所知，如果我想比較兩組之間的均值，t 檢驗是正確的檢驗。 如果我想比較兩組的兩個標准差，則最好使用 F 檢驗。 我的理解對嗎？

Answer 1

您可以輕松地遍歷subset= 。

t(with(mtcars, sapply(unique(cyl), \(i) t.test(am, subset=cyl == i))))
#      statistic parameter p.value      conf.int  estimate null.value stderr     alternative method              data.name
# [1,] 4.605489  31        6.632258e-05 numeric,2 0.40625  0          0.08820997 "two.sided" "One Sample t-test" "am"     
# [2,] 4.605489  31        6.632258e-05 numeric,2 0.40625  0          0.08820997 "two.sided" "One Sample t-test" "am"     
# [3,] 4.605489  31        6.632258e-05 numeric,2 0.40625  0          0.08820997 "two.sided" "One Sample t-test" "am"

更具體的你的數據，你可以這樣做：

tcols <- c('Return', 'Age', 'Size')
r <- t(with(subset(dat, Rating %in% c(1, 5)), 
     sapply(setNames(tcols, tcols), \(i) unlist(
       t.test(reformulate('Rating', i))[
         c('estimate', 'statistic', 'p.value')]
       ))))
cbind(as.data.frame(r),
      ' '=c("   ", "*  ", "** ", "***")[
        rowSums(outer(r[, 'p.value'], c(Inf, 0.05, 0.01, 0.001), `<`))])
#        estimate.mean in group 1 estimate.mean in group 5 statistic.t   p.value    
# Return             4.666667e-02                     0.05  -0.1552301 0.8883096    
# Age                5.333333e+00                     5.60  -0.2198599 0.8353634    
# Size               7.188400e+04                 19724.60   4.0457818 0.0109848 *

注意R >= 4.1 使用。

編輯

as.data.frame(t(with(subset(dat, Rating %in% c(1, 5)), 
       sapply(setNames(tcols, tcols), \(i) unlist(
         t.test(reformulate('Rating', i))[
           c('estimate', 'statistic', 'p.value')]
       ))))) |>
  {\(.) cbind(mean.diff.5.1=apply(.[1:2], 1, diff), .[3:4])}() |> 
  cbind(' '=c("   ", "*  ", "** ", "***")[
          rowSums(outer(r[, 'p.value'], c(Inf, 0.05, 0.01, 0.001), `<`))],
        `colnames<-`(t(aggregate(. ~ Rating, dat[3:6], mean)[-1]), 
                     paste0('mean.rating.', 1:5))) |>
  {\(.) .[c(5:9, 1:4)]}()
#        mean.rating.1 mean.rating.2 mean.rating.3 mean.rating.4 mean.rating.5 mean.diff.5.1 statistic.t   p.value    
# Return  4.666667e-02          0.07          0.01        0.0525          0.05  3.333333e-03  -0.1552301 0.8883096    
# Age     5.333333e+00          7.00          9.00        3.2500          5.60  2.666667e-01  -0.2198599 0.8353634    
# Size    7.188400e+04      94567.00      50243.00     5201.7500      19724.60 -5.215940e+04   4.0457818 0.0109848 *

R：帶子集的 T 統計量

問題描述

1 個解決方案

解決方案1
3 已采納 2022-04-08 06:27:01

編輯

R：帶子集的 T 統計量

問題描述

1 個解決方案

解決方案1 3 已采納 2022-04-08 06:27:01

編輯

解決方案1
3 已采納 2022-04-08 06:27:01