如何從多個組的 cor.test 結果中提取估計值和 p 值？

Question

我有一組數據，在多個時間點有多個組（下面的示例剪切）：

我一直在嘗試在每個狀態和每個性別之間的 X 和 Y 之間進行多個 cor.test。

我很難弄清楚組比較，所以我按狀態過濾並將我的性別 cor.tests 分別拆分為 Status = Red 和 Status = Blue（使用過濾器）。

這是我當前的代碼，它在每個性別上運行 cor.test：

    red_status <- all %>% filter(status == "Red")
    cor_red <- by(red_status, red_status$gender, 
                  FUN = function(df) cor.test(df$X, df$Y, method = "spearman"))

output 結果顯示了每個性別的 3 個不同的 cor.test：


red_status$gengrp: M
    Spearman's rank correlation rho

data:  df$X and df$Y
S = 123.45, p-value = 0.123
alternative hypothesis: true rho is not equal to 0
sample estimates:
     rho 
0.123456 

----------------------------------

red_status$gengrp: F
... (same output style as gengrp: M ^)

----------------------------------
red_status$gengrp: O
... (same output style as gengrp: M ^)

我想要做的是提取所有性別 cor.test 的估計值和 p 值，並將它們放在 dataframe 中。

我想我可以使用 data.frame() function 來提取性別名稱和相關元素，然后為 p 值添加另一列，但是這樣做給了我一個錯誤：

# Where red_status[1] is gender names (M,F,O) and red_status[[3:4]] are the Spearman p-value and rho estimate *within each gender category*
data.frame(group = dimnames(red_status)[1], est = as.vector(red_status)[[3]], 
pval = as.vector(red_status[[4]])

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class ‘"htest"’ to a data.frame

由於我按 Status == Red 過濾，因此我必須再次重新運行代碼以獲得 Status == Blue 的性別 cor.test 結果，然后在最后將估計值和 p 值全部綁定到 1 df。

我的目標是能夠創建一個數據框，顯示每個狀態和性別的相關估計和 p 值：

Status   Gender   Estimate(rho)    P-value
   Red        M            1.23      0.123
   Red        F            0.45      0.054
   Red        O             ...        ...
  Blue        M           0.004      0.123
  Blue        F             ...        ...
  Blue        O             ...        ...

任何幫助/提示將不勝感激。

Answer 1

答案基於@Ruam Pimentel 關於 rstatix package 的評論。

使用管道 + cor_test() function，我可以按狀態和性別分組來運行相關性測試。 cor_test() function 旨在 output dataframe 包含相關的所有元素（即統計、p 值、估計），所有這些都取決於選擇的相關方法。

這是有效的代碼：

 r <- dfall %>%
                group_by(status, gender) %>%
                cor_test(X, Y, method = "spearman")

結果（編輯了數字）：

  status  gengrp var1   var2   cor  statistic    p   method  
  <chr>   <fct>  <chr>  <chr> <dbl>     <dbl>  <dbl> <chr>   
1 Red       M     X       Y    0.98    -0.123  0.123  Spearman
2 Red       F     X       Y    0.12     0.456  0.456  Spearman
3 Red       O     X       Y    0.34     0.944  0.789  Spearman
4 Blue      M     X       Y    0.56     0.246  0.101  Spearman
5 Blue      F     X       Y    0.78    0.4107  0.111  Spearman
6 Blue      O     X       Y     0.9     0.123  0.122  Spearman

Answer 2

雖然rstatix解決方案比 Base R 解決方案需要的代碼少得多，但我發布了一個代碼來證明可以在 Base R 中創建請求的 output。

Base R 解決方案的關鍵是了解如何從cor.test()導航 output object 以提取請求的內容，將其轉換為單行數據幀，以及rbind()將對象列表轉換為單個數據幀。

首先，我們生成一些數據。

set.seed(9108171) # for reproducibility
gender <- rep(c("F","M", "O"),40)
status <- c(rep("Red",60),rep("Blue",60))
x <- round(rnorm(120,mean = 6,sd = 2),1)
y <- round(rnorm(120,mean = 8,sd = 1),1)

df <- data.frame(gender,status, x, y)

接下來，我們根據原始帖子中的代碼篩選出紅色項目，然后運行 cor.test() function。

# filter red
library(dplyr)
red_status <- filter(df,status == "Red")
cor_red <- by(red_status, red_status$gender,
              FUN = function(df) cor.test(df$x, df$y, method = "spearman"))

最后我們使用lapply()來提取數據並將其組合成一個數據框。 請注意提取運算符[[形式的使用，它從cor.test()生成的列表列表中刪除了一層嵌套。 另請注意我們如何使用名稱向量、 gender類別來驅動lapply() 。 這些值用於區分先前調用cor.test()時的分組。

# extract the required data
theResults <- lapply(c("F","M","O"),function(x){
  aTest <- cor_red[[x]]
  data.frame(status = "Red",
             test = names(aTest$statistic),
             value = aTest$statistic,
             p_value = aTest$p.value,
             rho = aTest$estimate)
})

# rbind the results into a single data frame
do.call(rbind,theResults)

...以及紅色狀態的結果：

> do.call(rbind,theResults)
   status test    value   p_value         rho
S     Red    S 1072.672 0.4137465  0.19347947
S1    Red    S 1396.400 0.8344303 -0.04992459
S2    Red    S 1281.132 0.8777763  0.03674259

我們可以對 Blue 狀態重復該過程並將結果結合起來以獲得以下結果：

> rbind(blueResults,redResults)
    status test    value    p_value         rho
S     Blue    S 1541.034 0.50402812 -0.15867211
S1    Blue    S 1087.954 0.44253280  0.18198950
S2    Blue    S 1880.742 0.06950608 -0.41409194
S3     Red    S 1072.672 0.41374648  0.19347947
S11    Red    S 1396.400 0.83443026 -0.04992459
S21    Red    S 1281.132 0.87777629  0.03674259
>

生成最終表的完整腳本是：

set.seed(9108171) # for reproducibility
gender <- rep(c("F","M", "O"),40)
status <- c(rep("Red",60),rep("Blue",60))
x <- round(rnorm(120,mean = 6,sd = 2),1)
y <- round(rnorm(120,mean = 8,sd = 1),1)

df <- data.frame(gender,status, x, y)

# filter red
library(dplyr)
red_status <- filter(df,status == "Red")
cor_red <- by(red_status, red_status$gender,
              FUN = function(df) cor.test(df$x, df$y, method = "spearman"))

# extract the required data
theResults <- lapply(c("F","M","O"),function(x){
  aTest <- cor_red[[x]]
  data.frame(status = "Red",
             test = names(aTest$statistic),
             value = aTest$statistic,
             p_value = aTest$p.value,
             rho = aTest$estimate)
})

# rbind the results into a single data frame
redResults <- do.call(rbind,theResults)
redResults

blue_status <- filter(df,status == "Blue")
cor_blue<- by(blue_status, blue_status$gender,
              FUN = function(df) cor.test(df$x, df$y, method = "spearman"))

# extract the required data
theResults <- lapply(c("F","M","O"),function(x){
  aTest <- cor_blue[[x]]
  data.frame(status = "Blue",
             test = names(aTest$statistic),
             value = aTest$statistic,
             p_value = aTest$p.value,
             rho = aTest$estimate)
})

# rbind the results into a single data frame
blueResults <- do.call(rbind,theResults)
rbind(blueResults,redResults)

我認識到我可以將重復的代碼抽象成一個支持 function 以減少解決方案所需的總代碼行數，但這留給讀者作為一個微不足道的練習。

如何從多個組的 cor.test 結果中提取估計值和 p 值？

問題描述

2 個解決方案

解決方案1
1 已采納 2022-11-24 16:26:05

解決方案2
1 2022-11-24 16:53:33

如何從多個組的 cor.test 結果中提取估計值和 p 值？

問題描述

2 個解決方案

解決方案1 1 已采納 2022-11-24 16:26:05

解決方案2 1 2022-11-24 16:53:33

解決方案1
1 已采納 2022-11-24 16:26:05

解決方案2
1 2022-11-24 16:53:33