简体   繁体   English

DESeq2 中的计算对比:手动系数与 DESeq2 自动对比之间的差异

[英]Calculated contrasts in DESeq2: difference between manual coefficients and DESeq2 authomatic contrast

I have the following model on DESeq2 where I am blocking for replicate.我在 DESeq2 上有以下 model,我正在阻止复制。

dds <- DESeqDataSetFromMatrix(countData = CPEB4_featureCounts_3utr_matrix,
                              colData = CPEB4_sample_list,
                              design = ~   replicate  + sample_name)
dds <- DESeq(dds)`

This are the metadata:这是元数据:

          sample_name replicate
0195_2022       INPUT         4
0196_2022         IgG         4
0197_2022       CPEB4         4
0198_2022       INPUT         5
0199_2022         IgG         5
0200_2022       CPEB4         5
2125_2021       INPUT         1
2126_2021         IgG         1
2127_2021       CPEB4         1
2235_2021       INPUT         2
2237_2021       CPEB4         2
2238_2021       INPUT         3
2239_2021         IgG         3
2240_2021       CPEB4         3

I want to extract the contrast "CPEB4 - IgG"我想提取对比“CPEB4 - IgG”

I can do it by using the results function like this:我可以像这样使用results function 来做到这一点:

CPEB4vsIgG <- results(dds, contrast=c("sample_name","CPEB4","IgG"))

I get the following DEGs:我得到以下 DEG:

summary(CPEB4vsIgG)
out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 598, 3.5%
LFC < 0 (down)     : 30, 0.17%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

However, I could also manually calculate the coefficient (I usually do this when I have more complex contrasts), like this:但是,我也可以手动计算系数(我通常在有更复杂的对比时这样做),如下所示:

mod_mat <- model.matrix(design(dds), colData(dds))
CPEB4 <- colMeans(mod_mat[dds$sample_name == "CPEB4", ])
IgG <- colMeans(mod_mat[dds$sample_name == "IgG", ])
CPEB4vsIgG_2 <- results(dds,  contrast = (CPEB4 - IgG))

However, with this code I get a slightly different list of DEGs:但是,通过这段代码,我得到了一个略有不同的 DEG 列表:

summary(CPEB4vsIgG_2)
out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 672, 3.9%
LFC < 0 (down)     : 81, 0.47%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

If I check the coefficient for the two groups I am subtracting it looks like everything is fine:如果我检查我减去的两组的系数,看起来一切都很好:

CPEB4
     (Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG 
             1.0              0.2              0.2              0.2              0.2              0.0 
sample_nameINPUT 
             0.0 
IgG

     (Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG 
            1.00             0.00             0.25             0.25             0.25             1.00 
sample_nameINPUT 
            0.00 

Why is there this difference?为什么会有这种差异?

If I create a model without taking into account the replicate I have the same results with the two approaches.如果我在不考虑复制的情况下创建 model,这两种方法的结果相同。

You can find the answer of this issue here: https://support.bioconductor.org/p/9148941/您可以在这里找到这个问题的答案: https://support.bioconductor.org/p/9148941/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM