[英]Genomic coordinates of HGNC gene names
I want to get coordinates of human genes from my list (consisting of hgnc genes id) using GenomicFeatures and TxDb.Hsapiens.UCSC.hg19.knownGene R packages from Bioconductor. 我想使用Bioconductor的GenomicFeatures和TxDb.Hsapiens.UCSC.hg19.knownGene R包从清单中获取人类基因的坐标(由hgnc基因id组成)。
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb=(TxDb.Hsapiens.UCSC.hg19.knownGene)
my_genes = c("INO80","NASP","INO80D","SMARCA1")
select(txdb, keys = my_genes,
columns=c("TXCHROM","TXSTART","TXEND","TXSTRAND"),
keytype="GENEID")
However, it doesn't' work because txdb doesn't take hgnc identifiers; 但是,它不起作用,因为txdb不使用hgnc标识符。 how can it be solved? 怎么解决呢? I couldn't find any appropriate keytype that will support hgnc and not sure how to match hgnc id I have and GENEID from txdb. 我找不到任何支持hgnc的适当键类型,并且不确定如何匹配我拥有的hgnc id和txdb中的GENEID。
I am not familiar with TxDb and the kind of attributes it accepts/includes. 我不熟悉TxDb及其接受/包含的属性类型。
I can offer you an alternative approach using the biomaRt
package though, which accepts hgnc as well. 我可以为您提供使用biomaRt
软件包的替代方法,该软件包也接受hgnc。
library(biomaRt)
my_genes = c("INO80","NASP","INO80D","SMARCA1")
m <- useMart('ensembl', dataset='hsapiens_gene_ensembl') # create a mart object
df <- getBM(mart=m, attributes=c('hgnc_symbol', 'description', 'chromosome_name',
'start_position', 'end_position', 'strand',
'ensembl_gene_id'),
filters='hgnc_symbol', values=my_genes) # where df is a data.frame with all your requested info
It has a ton of attributes to choose from, which you can find out by doing a simple: 它有很多属性可供选择,您可以通过简单的操作来找到它们:
listAttributes(m) # our current dataset
For more info check ??biomaRt
有关更多信息,请检查??biomaRt
Hope this helps. 希望这可以帮助。
Because txdb is for transcripts, and it doesn't have (hgnc) geneSymbol , but it has EntrezID . 因为txdb用于转录本,并且没有(hgnc) geneSymbol ,但是它具有EntrezID 。
First, we need to map geneSymbol to EntrezID . 首先,我们需要将geneSymbol映射到EntrezID 。
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
myGeneSymbols <- select(org.Hs.eg.db,
keys = c("INO80","NASP","INO80D","SMARCA1"),
columns = c("SYMBOL","ENTREZID"),
keytype = "SYMBOL")
# SYMBOL ENTREZID
# 1 INO80 54617
# 2 NASP 4678
# 3 INO80D 54891
# 4 SMARCA1 6594
Then we can subset txdb
: 然后我们可以子集txdb
:
myGeneSymbolsTx <- select(TxDb.Hsapiens.UCSC.hg19.knownGene,
keys = myGeneSymbols$ENTREZID,
columns = c("GENEID", "TXID", "TXCHROM", "TXSTART", "TXEND"),
keytype = "GENEID")
# GENEID TXID TXCHROM TXSTART TXEND
# 1 54617 55599 chr15 41267988 41280172
# 2 54617 55600 chr15 41271079 41408340
# 3 54617 55601 chr15 41271079 41408340
# 4 4678 1229 chr1 46049660 46079853
# 5 4678 1230 chr1 46049660 46081143
# 6 4678 1231 chr1 46049660 46084578
# 7 4678 1232 chr1 46049660 46084578
# 8 4678 1233 chr1 46049660 46084578
# 9 4678 1234 chr1 46067733 46075197
# 10 4678 1235 chr1 46077135 46084578
# 11 54891 12593 chr2 206858445 206950906
# 12 6594 77970 chrX 128580478 128657460
# 13 6594 77971 chrX 128580478 128657460
# 14 6594 77972 chrX 128580740 128657460
# 15 6594 77973 chrX 128580740 128657460
If required, we can then add geneSymbol to the table using merge: 如果需要,我们可以使用merge将geneSymbol添加到表中:
res <- merge(myGeneSymbols, myGeneSymbolsTx, by.x = "ENTREZID", by.y = "GENEID")
# ENTREZID SYMBOL TXID TXCHROM TXSTART TXEND
# 1 4678 NASP 1229 chr1 46049660 46079853
# 2 4678 NASP 1230 chr1 46049660 46081143
# 3 4678 NASP 1231 chr1 46049660 46084578
# 4 4678 NASP 1232 chr1 46049660 46084578
# 5 4678 NASP 1233 chr1 46049660 46084578
# 6 4678 NASP 1234 chr1 46067733 46075197
# 7 4678 NASP 1235 chr1 46077135 46084578
# 8 54617 INO80 55599 chr15 41267988 41280172
# 9 54617 INO80 55600 chr15 41271079 41408340
# 10 54617 INO80 55601 chr15 41271079 41408340
# 11 54891 INO80D 12593 chr2 206858445 206950906
# 12 6594 SMARCA1 77970 chrX 128580478 128657460
# 13 6594 SMARCA1 77971 chrX 128580478 128657460
# 14 6594 SMARCA1 77972 chrX 128580740 128657460
# 15 6594 SMARCA1 77973 chrX 128580740 128657460
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.