R-運行Spearman相關性時p值不一致

Question

我的問題是，當出於某種奇怪的原因計算運行相關性時，對於相同的估計/相關性值，我沒有獲得相同的p值。

我的目標是在同一data.frame（以下示例中的subject1和subject2）上的兩個向量上計算正在運行的Spearman相關性。 另外，我的窗口（向量的長度）和步幅（每個窗口之間的跳躍/步長）是恆定的。 這樣，當查看下面的公式時（來自wiki ），我應該得到相同的臨界t，因此對於相同的Spearman相關性，其p值也應相同。 這是因為n狀態相同（窗口大小相同）， r相同。 但是，我的最終p值不同。

#Needed pkgs    
require(tidyverse)
require(pspearman)
require(gtools)

#Sample data
set.seed(528)
subject1 <- rnorm(40, mean = 85, sd = 5)

set.seed(528)
subject2 <- c(
  lag(subject1[1:21]) - 10, 
  rnorm(n = 6, mean = 85, sd = 5), 
  lag(subject1[length(subject1):28]) - 10)

df <- data.frame(subject1 = subject1, 
                 subject2 = subject2) %>% 
  rowid_to_column(var = "Time") 

df[is.na(df)] <- subject1[1] - 10

rm(subject1, subject2)

#Function for Spearman
psSpearman <- function(x, y) 
{
  out <- pspearman::spearman.test(x, y,
                                  alternative = "two.sided", 
                                  approximation = "t-distribution") %>% 
    broom::tidy()
  return(data.frame(estimate = out$estimate,
                    statistic = out$statistic,
                    p.value = out$p.value )
}

#Running correlation along the subjects
dfRunningCor <- running(df$subject1, df$subject2, 
                        fun = psSpearman,
                        width = 20,
                        allow.fewer = FALSE, 
                        by = 1,
                        pad = FALSE, 
                        align = "right") %>% 
  t() %>% 
  as.data.frame() 

#Arranging the Results into easy to handle data.frame 
Results <- do.call(rbind.data.frame, dfRunningCor) %>% 
  t() %>%
  as.data.frame() %>%
  rownames_to_column(var = "Win") %>% 
  gather(CorValue, Value, -Win) %>% 
  separate(Win, c("fromIndex", "toIndex")) %>%
  mutate(fromIndex = as.numeric(substring(fromIndex, 2)),
         toIndex = as.numeric(toIndex, 2)) %>%
  spread(CorValue, Value) %>% 
  arrange(fromIndex) %>% 
  select(fromIndex, toIndex, estimate, statistic, p.value)

我的問題是，當我用估算值（Spearman rho; estimate ），窗口編號（ fromIndex ）繪制Results並為p值上色時，我應該在同一區域獲得類似顏色的“隧道” /“路徑”-我不。 例如，在下面的圖片中，紅色圓圈中相同高度的點應具有相同的顏色-但不同。

圖形代碼：

Results %>% 
  ggplot(aes(fromIndex, estimate, color = p.value)) + 
  geom_line()

到目前為止 ， 我發現的原因可能是：1.在小的樣本或許多聯系中，像Hmisc::rcorr()這樣的函數往往不會給出相同的Hmisc::rcorr() 。 這就是為什么我使用pspearman::spearman.test ，根據我在這里閱讀的內容，它可以解決此問題。 2.小樣本-我嘗試使用大樣本。 我仍然遇到同樣的問題。 3.我嘗試舍入p值-我仍然遇到相同的問題。

謝謝您的幫助！

編輯。

ggplot 可能是 “偽”着色嗎？ 可能是ggplot只是插值“最后一個”顏色直到下一個點？ 這就是為什么我從第5點到第6點變成“淺藍色”而從第7點到第8點變成“深藍色”的原因？

Answer 1

您為p.value變量獲得的結果與estimate值一致。 您可以按以下方式檢查它：

Results$orderestimate <- order(-abs(Results$estimate))
Results$orderp.value <- order(abs(Results$p.value))
identical(Results$orderestimate ,Results$orderp.value)

我認為您不應該在圖表中為p.value包括顏色，這是不必要的視覺干擾，並且難以解釋。

如果您是我，我將只顯示p.value並可能包含一個點來指示estimate變量的符號。

p <- Results %>% 
  ggplot(aes(fromIndex,  p.value)) + 
  geom_line()

# If you want to display the sign of the estimate
Results$estimate.sign <- as.factor(sign(Results$estimate))
p+geom_point( aes(color = estimate.sign ))

R-運行Spearman相關性時p值不一致

問題描述

編輯。

1 個解決方案

解決方案1
0 2018-05-13 18:38:10

R-運行Spearman相關性時p值不一致

問題描述

編輯。

1 個解決方案

解決方案1 0 2018-05-13 18:38:10

解決方案1
0 2018-05-13 18:38:10