如何通過計算相關系數來縮小基因表達矩陣的維數？

Question

我對尋找基因列表之間的Pearson相關系數感興趣。 基本上，我有Affymetrix基因水平表達矩陣（行中的基因和列中的樣品ID），還有微陣列實驗觀察的注釋數據，其中行中的樣品ID和列中的描述標識。

數據

> expr_mat[1:8, 1:3]
             Tarca_001_P1A01 Tarca_003_P1A03 Tarca_004_P1A04
1_at                6.062215        6.125023        5.875502
10_at               3.796484        3.805305        3.450245
100_at              5.849338        6.191562        6.550525
1000_at             3.567779        3.452524        3.316134
10000_at            6.166815        5.678373        6.185059
100009613_at        4.443027        4.773199        4.393488
100009676_at        5.836522        6.143398        5.898364
10001_at            6.330018        5.601745        6.137984

> anodat[1:8, 1:3]
               V1   V2    V3
1        SampleID   GA Batch
2 Tarca_001_P1A01   11     1
3 Tarca_013_P1B01 15.3     1
4 Tarca_025_P1C01 21.7     1
5 Tarca_037_P1D01 26.7     1
6 Tarca_049_P1E01 31.3     1
7 Tarca_061_P1F01 32.1     1
8 Tarca_051_P1E03 19.7     1

目標：

我打算看看每個樣本中的基因如何與注釋數據中相應樣本的GA值相關聯，然后生成帶有目標觀察數據anodat$GA的保持高度相關基因的子表達矩陣。

我的嘗試 ：

gene_corrs <- function(expr_mat, anno_mat){
    stopifnot(ncol(expr_mat)==nrow(anno_mat))
    res <- list()
    lapply(colnames(expr_mat), function(x){
        lapply(x, rownames(y){
            if(colnames(x) %in% rownames(anno_mat)){
                cor_mat <- stats::cor(y, anno_mat$GA, method = "pearson")
                ncor <- ncol(cor_mat)
                cmatt <- col(cor_mat)
                ord <- order(-cmat, cor_mat, decreasing = TRUE)- (ncor*cmatt - ncor)
                colnames(ord) <- colnames(cor_mat)
                res <- cbind(ID=c(cold(ord), ID2=c(ord)))
                res <- as.data.frame(cbind(out, cor=cor_mat[res]))
                res <- cbind(res, cor=cor_mat[out])
                res <- as.dara.frame(res)
            }
        })
    })
    return(res)
}

但是，我的上述實現沒有返回我期望的結果，我需要通過找到與anodat$GA有強烈相關性的基因來過濾掉這些基因。

另一嘗試 ：

我讀過幾篇關於類似問題的文章，有些人討論過使用limma軟件包。 這是我使用limma嘗試。 在這里，我使用anodat$GA作為協變量來擬合limma線性模型：

library(limma)
fit <- limma::lmFit(expr_mat, design = model.matrix( ~ 0 + anodat$GA)
fit <- eBayes(fit)
topTable(fit, coef=2)

那么我期望從上面的代碼中獲得一個相關矩陣，並希望執行以下操作以獲得過濾后的子表達式矩陣：

idx <- which( (abs(cor) > 0.8) & (upper.tri(cor)), arr.ind=TRUE)
idx <- unique(c(idx[, 1],idx[, 2])
correlated.genes <- matrix[idx, ]

但是我仍然沒有得到正確的答案。 我對使用limma方法limma信心，但是我無法弄清楚代碼上方出了什么問題。 誰能指出我如何進行這項工作？ 有什么有效的方法可以做到這一點嗎？

Answer 1

不要讓您的數據很難再次檢查，但是抽象地講，我會嘗試這樣做：

library(matrixTests)
cors <- row_cor_pearson(expr_mat, anodat$GA)

which(cors$cor > 0.9)  # to get the indeces of genes with correlation > 0.9

如何通過計算相關系數來縮小基因表達矩陣的維數？

問題描述

1 個解決方案

解決方案1
0 2019-06-22 22:28:15

如何通過計算相關系數來縮小基因表達矩陣的維數？

問題描述

1 個解決方案

解決方案1 0 2019-06-22 22:28:15

解決方案1
0 2019-06-22 22:28:15