简体   繁体   中英

goseq package in R “missing value where TRUE/FALSE needed” error

I am attempting to run a GO Analysis in R (I have never done this analysis, so I am trying different packages), and I am struggling to find the problem with my code in the goseq package.

I start with this code which produces a list of the differentially expressed gene names:

 de.genes <- rownames(res)[ which(res$padj < fdr.threshold & !is.na(res$padj)) ]

Then I try to run this code (based on page 7 of the vignette ( https://bioconductor.org/packages/devel/bioc/vignettes/goseq/inst/doc/goseq.pdf )

 pwf <- nullp(de.genes, "hg38","geneSymbol")

but I get the following error:

 Can't find hg38/geneSymbol length data in genLenDataBase...
 Found the annotation package, TxDb.Hsapiens.UCSC.hg38.knownGene
 Trying to get the gene lengths from it.
 Error in if (matched_frac == 0) { : missing value where TRUE/FALSE needed
 In addition: Warning message:
 In grep(txdbPattern, installedPackages):argument 'pattern' has length > 1 and only the first element will be used

I found this forum: https://support.bioconductor.org/p/38580/ that says I need an "indicator variable" but I do not know what this is.

Any help with this error would be greatly appreciated, or if you know of any other GO packages that are easy to learn. Thanks!

You can check the supported databases, hg38 is not one of them:

library(org.Hs.eg.db)
library(goseq)

supported[grep("hg38|hg19",supported$Genome),]
   Genome         Id  Id Description Lengths in geneLeneDataBase
4    hg19  knownGene  Entrez Gene ID                        TRUE
36   hg19    ensGene Ensembl gene ID                        TRUE
81   hg19 geneSymbol     Gene Symbol                        TRUE
98   hg38                                                  FALSE
   GO Annotation Available
4                     TRUE
36                    TRUE
81                    TRUE
98                    TRUE

You can get a rough idea of what it looks like by using hg19, you will have some missing or unmatched by should be ok. You need to have a binary vector and it should be named, for example:

set.seed(111)
allgenes = keys(org.Hs.eg.db,keytype="SYMBOL")
de.genes = rbinom(100,1,0.3)
names(de.genes) = sample(allgenes,100)

It looks like this:

  GALNT5        TPRKB         CD48       OR52R1 LOC105372708 LOC112163649 
       0            1            0            0            0            0 

LOC105369203 LOC110121115 LOC105377654 LOC105371502 LOC101929964 HPC14 0 0 0 0 0 0 IGHD4-17 LOC101927993 HINT1 BCC3 RPL18P3 LOC108281192 0 0 0 0 0 1 RNU6-793P JUN 0 0

This will be ok:

res = nullp(de.genes,"hg19","geneSymbol")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM