[英]goseq package in R “missing value where TRUE/FALSE needed” error
I am attempting to run a GO Analysis in R (I have never done this analysis, so I am trying different packages), and I am struggling to find the problem with my code in the goseq package.我正在尝试在 R 中运行 GO 分析(我从未做过此分析,所以我正在尝试不同的包),并且我正在努力在 goseq ZEFE90A8E604A7C840DA33B 中找到我的代码的问题。
I start with this code which produces a list of the differentially expressed gene names:我从这段代码开始,它产生了一个差异表达基因名称的列表:
de.genes <- rownames(res)[ which(res$padj < fdr.threshold & !is.na(res$padj)) ]
Then I try to run this code (based on page 7 of the vignette ( https://bioconductor.org/packages/devel/bioc/vignettes/goseq/inst/doc/goseq.pdf )然后我尝试运行此代码(基于小插图的第 7 页( https://bioconductor.org/packages/devel/bioc/vignettes/goseq/inst/doc/goseq.pdf )
pwf <- nullp(de.genes, "hg38","geneSymbol")
but I get the following error:但我收到以下错误:
Can't find hg38/geneSymbol length data in genLenDataBase...
Found the annotation package, TxDb.Hsapiens.UCSC.hg38.knownGene
Trying to get the gene lengths from it.
Error in if (matched_frac == 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In grep(txdbPattern, installedPackages):argument 'pattern' has length > 1 and only the first element will be used
I found this forum: https://support.bioconductor.org/p/38580/ that says I need an "indicator variable" but I do not know what this is.我发现这个论坛: https://support.bioconductor.org/p/38580/说我需要一个“指标变量”,但我不知道这是什么。
Any help with this error would be greatly appreciated, or if you know of any other GO packages that are easy to learn.非常感谢您对此错误的任何帮助,或者如果您知道任何其他易于学习的 GO 软件包。 Thanks!
谢谢!
You can check the supported databases, hg38 is not one of them:您可以检查支持的数据库,hg38 不是其中之一:
library(org.Hs.eg.db)
library(goseq)
supported[grep("hg38|hg19",supported$Genome),]
Genome Id Id Description Lengths in geneLeneDataBase
4 hg19 knownGene Entrez Gene ID TRUE
36 hg19 ensGene Ensembl gene ID TRUE
81 hg19 geneSymbol Gene Symbol TRUE
98 hg38 FALSE
GO Annotation Available
4 TRUE
36 TRUE
81 TRUE
98 TRUE
You can get a rough idea of what it looks like by using hg19, you will have some missing or unmatched by should be ok.您可以通过使用 hg19 大致了解它的外观,您将有一些缺失或不匹配应该没问题。 You need to have a binary vector and it should be named, for example:
你需要有一个二进制向量,它应该被命名,例如:
set.seed(111)
allgenes = keys(org.Hs.eg.db,keytype="SYMBOL")
de.genes = rbinom(100,1,0.3)
names(de.genes) = sample(allgenes,100)
It looks like this:它看起来像这样:
GALNT5 TPRKB CD48 OR52R1 LOC105372708 LOC112163649
0 1 0 0 0 0
LOC105369203 LOC110121115 LOC105377654 LOC105371502 LOC101929964 HPC14 0 0 0 0 0 0 IGHD4-17 LOC101927993 HINT1 BCC3 RPL18P3 LOC108281192 0 0 0 0 0 1 RNU6-793P JUN 0 0 LOC105369203 LOC110121115 LOC105377654 LOC105371502 LOC101929964 HPC14 0 0 0 0 0 0 IGHD4-17 LOC101927993 HINT1 BCC3 RPL18P3 LOC108281192-0 0 7 93 JUN 0 010N6
This will be ok:这会没问题:
res = nullp(de.genes,"hg19","geneSymbol")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.