简体   繁体   中英

Missing values in lmFit [limma R package]

[This question is specific to bioinformatics. There are posts elsewhere but I couldn't find a satisfactory answer.]

If I have a gene/protein expression data with missing values ( NA ), how does lmFit of the limma package handle these values? Note that the missing values are not in the design matrix, but rather, in the data matrix only.

Here is a simulated, working example that illustrates my question:

library(limma)
my_genes <- matrix(rnorm(9000, -10, 10), ncol=4)
my_genes <- as.data.frame(my_genes)
rownames(my_genes) <- paste("Gene", 1:nrow(my_genes))
## Randomly introducing NAs
purrr::map_df(my_genes, function(x) {x[sample(c(TRUE, NA), prob = c(0.95, 0.05), size = length(x), replace = TRUE)]})
tx <- 1:2 #suppose treatment is columns 1 & 2
ctrls <- 3:4 #suppose controls is columns 3 & 4
my_design <- model.matrix( ~factor(c(1,1,0,0)))
my_design
fit <- lmFit(my_genes, my_design)
fit <- eBayes(fit)
plot(fit$logFC, -log10(fit$p.value))

If you find any websites / posts that can help, feel free to share with a post or comment.

This post in CrossValidated answers my own question in detail. In short, the way of how lmFit deals with missing values is similar to how lm does. Rows with missing values are subjected to na.exclude , or "case-wise deletion."

Alternatively: Though it's not an ideal solution, it might be appropriate to just impute the missing gene-expression values. For example, using the knn.impute function in the impute Bioconductor package.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM