简体   繁体   English

如何通过计算相关系数来缩小基因表达矩阵的维数?

[英]How to reduce dimension of gene expression matrix by calculating correlation coefficients?

I am in interested in finding Pearson correlation coefficients between a list of genes. 我对寻找基因列表之间的Pearson相关系数感兴趣。 Basically, I have Affymetrix gene level expression matrix (genes in the rows and sample ID on the columns), and I have annotation data of microarray experiment observation where sample ID in the rows and description identification on the columns. 基本上,我有Affymetrix基因水平表达矩阵(行中的基因和列中的样品ID),还有微阵列实验观察的注释数据,其中行中的样品ID和列中的描述标识。

data 数据

> expr_mat[1:8, 1:3]
             Tarca_001_P1A01 Tarca_003_P1A03 Tarca_004_P1A04
1_at                6.062215        6.125023        5.875502
10_at               3.796484        3.805305        3.450245
100_at              5.849338        6.191562        6.550525
1000_at             3.567779        3.452524        3.316134
10000_at            6.166815        5.678373        6.185059
100009613_at        4.443027        4.773199        4.393488
100009676_at        5.836522        6.143398        5.898364
10001_at            6.330018        5.601745        6.137984

> anodat[1:8, 1:3]
               V1   V2    V3
1        SampleID   GA Batch
2 Tarca_001_P1A01   11     1
3 Tarca_013_P1B01 15.3     1
4 Tarca_025_P1C01 21.7     1
5 Tarca_037_P1D01 26.7     1
6 Tarca_049_P1E01 31.3     1
7 Tarca_061_P1F01 32.1     1
8 Tarca_051_P1E03 19.7     1

goal : 目标

I intend to see how the genes in each sample are correlated with GA value of corresponding samples in the annotation data, then generate sub expression matrix of keeping high correlated genes with target observation data anodat$GA . 我打算看看每个样本中的基因如何与注释数据中相应样本的GA值相关联,然后生成带有目标观察数据anodat$GA的保持高度相关基因的子表达矩阵。

my attempt : 我的尝试

gene_corrs <- function(expr_mat, anno_mat){
    stopifnot(ncol(expr_mat)==nrow(anno_mat))
    res <- list()
    lapply(colnames(expr_mat), function(x){
        lapply(x, rownames(y){
            if(colnames(x) %in% rownames(anno_mat)){
                cor_mat <- stats::cor(y, anno_mat$GA, method = "pearson")
                ncor <- ncol(cor_mat)
                cmatt <- col(cor_mat)
                ord <- order(-cmat, cor_mat, decreasing = TRUE)- (ncor*cmatt - ncor)
                colnames(ord) <- colnames(cor_mat)
                res <- cbind(ID=c(cold(ord), ID2=c(ord)))
                res <- as.data.frame(cbind(out, cor=cor_mat[res]))
                res <- cbind(res, cor=cor_mat[out])
                res <- as.dara.frame(res)
            }
        })
    })
    return(res)
}

however, my above implementation didn't return what I expected, I need to filter out the genes by finding genes which has a strong correlation with anodat$GA . 但是,我的上述实现没有返回我期望的结果,我需要通过找到与anodat$GA有强烈相关性的基因来过滤掉这些基因。

Another attempt : 另一尝试

I read few post about similar issue and some people discussed about using limma package. 我读过几篇关于类似问题的文章,有些人讨论过使用limma软件包。 Here is my attempt by using limma . 这是我使用limma尝试。 Here I used anodat$GA as a covariate to fit limma linear model: 在这里,我使用anodat$GA作为协变量来拟合limma线性模型:

library(limma)
fit <- limma::lmFit(expr_mat, design = model.matrix( ~ 0 + anodat$GA)
fit <- eBayes(fit)
topTable(fit, coef=2)

then I am expecting to get a correlation matrix from the above code, and would like to do following in order to get filtered sub expression matrix: 那么我期望从上面的代码中获得一个相关矩阵,并希望执行以下操作以获得过滤后的子表达式矩阵:

idx <- which( (abs(cor) > 0.8) & (upper.tri(cor)), arr.ind=TRUE)
idx <- unique(c(idx[, 1],idx[, 2])
correlated.genes <- matrix[idx, ]

but I still didn't get the right answer. 但是我仍然没有得到正确的答案。 I am confident about using limma approach but I couldn't figure out what went wrong above code again. 我对使用limma方法limma信心,但是我无法弄清楚代码上方出了什么问题。 Can anyone point me out how to make this work? 谁能指出我如何进行这项工作? Is there any efficient way to make this happen? 有什么有效的方法可以做到这一点吗?

Don't have your data so hard to double check, but in the abstract I would try this: 不要让您的数据很难再次检查,但是抽象地讲,我会尝试这样做:

library(matrixTests)
cors <- row_cor_pearson(expr_mat, anodat$GA)

which(cors$cor > 0.9)  # to get the indeces of genes with correlation > 0.9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM