I am attempting to run a GO Analysis in R (I have never done this analysis, so I am trying different packages), and I am struggling to find the problem with my code in the goseq package.
I start with this code which produces a list of the differentially expressed gene names:
de.genes <- rownames(res)[ which(res$padj < fdr.threshold & !is.na(res$padj)) ]
Then I try to run this code (based on page 7 of the vignette ( https://bioconductor.org/packages/devel/bioc/vignettes/goseq/inst/doc/goseq.pdf )
pwf <- nullp(de.genes, "hg38","geneSymbol")
but I get the following error:
Can't find hg38/geneSymbol length data in genLenDataBase...
Found the annotation package, TxDb.Hsapiens.UCSC.hg38.knownGene
Trying to get the gene lengths from it.
Error in if (matched_frac == 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In grep(txdbPattern, installedPackages):argument 'pattern' has length > 1 and only the first element will be used
I found this forum: https://support.bioconductor.org/p/38580/ that says I need an "indicator variable" but I do not know what this is.
Any help with this error would be greatly appreciated, or if you know of any other GO packages that are easy to learn. Thanks!
You can check the supported databases, hg38 is not one of them:
library(org.Hs.eg.db)
library(goseq)
supported[grep("hg38|hg19",supported$Genome),]
Genome Id Id Description Lengths in geneLeneDataBase
4 hg19 knownGene Entrez Gene ID TRUE
36 hg19 ensGene Ensembl gene ID TRUE
81 hg19 geneSymbol Gene Symbol TRUE
98 hg38 FALSE
GO Annotation Available
4 TRUE
36 TRUE
81 TRUE
98 TRUE
You can get a rough idea of what it looks like by using hg19, you will have some missing or unmatched by should be ok. You need to have a binary vector and it should be named, for example:
set.seed(111)
allgenes = keys(org.Hs.eg.db,keytype="SYMBOL")
de.genes = rbinom(100,1,0.3)
names(de.genes) = sample(allgenes,100)
It looks like this:
GALNT5 TPRKB CD48 OR52R1 LOC105372708 LOC112163649
0 1 0 0 0 0
LOC105369203 LOC110121115 LOC105377654 LOC105371502 LOC101929964 HPC14 0 0 0 0 0 0 IGHD4-17 LOC101927993 HINT1 BCC3 RPL18P3 LOC108281192 0 0 0 0 0 1 RNU6-793P JUN 0 0
This will be ok:
res = nullp(de.genes,"hg19","geneSymbol")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.