HGNC基因名称的基因组坐标

Question

I want to get coordinates of human genes from my list (consisting of hgnc genes id) using GenomicFeatures and TxDb.Hsapiens.UCSC.hg19.knownGene R packages from Bioconductor. 我想使用Bioconductor的GenomicFeatures和TxDb.Hsapiens.UCSC.hg19.knownGene R包从清单中获取人类基因的坐标（由hgnc基因id组成）。

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb=(TxDb.Hsapiens.UCSC.hg19.knownGene)

my_genes = c("INO80","NASP","INO80D","SMARCA1")

select(txdb, keys = my_genes,
       columns=c("TXCHROM","TXSTART","TXEND","TXSTRAND"), 
       keytype="GENEID")

However, it doesn't' work because txdb doesn't take hgnc identifiers; 但是，它不起作用，因为txdb不使用hgnc标识符。 how can it be solved? 怎么解决呢？ I couldn't find any appropriate keytype that will support hgnc and not sure how to match hgnc id I have and GENEID from txdb. 我找不到任何支持hgnc的适当键类型，并且不确定如何匹配我拥有的hgnc id和txdb中的GENEID。

Answer 1

I am not familiar with TxDb and the kind of attributes it accepts/includes. 我不熟悉TxDb及其接受/包含的属性类型。
I can offer you an alternative approach using the biomaRt package though, which accepts hgnc as well. 我可以为您提供使用biomaRt软件包的替代方法，该软件包也接受hgnc。

library(biomaRt)

my_genes = c("INO80","NASP","INO80D","SMARCA1")

m <- useMart('ensembl', dataset='hsapiens_gene_ensembl') # create a mart object
df <- getBM(mart=m, attributes=c('hgnc_symbol', 'description', 'chromosome_name',
                                 'start_position', 'end_position', 'strand',
                                 'ensembl_gene_id'),
            filters='hgnc_symbol', values=my_genes) # where df is a data.frame with all your requested info

It has a ton of attributes to choose from, which you can find out by doing a simple: 它有很多属性可供选择，您可以通过简单的操作来找到它们：

listAttributes(m) # our current dataset

For more info check ??biomaRt 有关更多信息，请检查??biomaRt

Hope this helps. 希望这可以帮助。

Answer 2

Because txdb is for transcripts, and it doesn't have (hgnc) geneSymbol , but it has EntrezID . 因为txdb用于转录本，并且没有（hgnc） geneSymbol ，但是它具有EntrezID 。

First, we need to map geneSymbol to EntrezID . 首先，我们需要将geneSymbol映射到EntrezID 。

library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

myGeneSymbols <- select(org.Hs.eg.db,
                        keys = c("INO80","NASP","INO80D","SMARCA1"),
                        columns = c("SYMBOL","ENTREZID"),
                        keytype = "SYMBOL")
#    SYMBOL ENTREZID
# 1   INO80    54617
# 2    NASP     4678
# 3  INO80D    54891
# 4 SMARCA1     6594

Then we can subset txdb : 然后我们可以子集txdb ：

myGeneSymbolsTx <- select(TxDb.Hsapiens.UCSC.hg19.knownGene,
                          keys = myGeneSymbols$ENTREZID,
                          columns = c("GENEID", "TXID", "TXCHROM", "TXSTART", "TXEND"),
                          keytype = "GENEID")
#    GENEID  TXID TXCHROM   TXSTART     TXEND
# 1   54617 55599   chr15  41267988  41280172
# 2   54617 55600   chr15  41271079  41408340
# 3   54617 55601   chr15  41271079  41408340
# 4    4678  1229    chr1  46049660  46079853
# 5    4678  1230    chr1  46049660  46081143
# 6    4678  1231    chr1  46049660  46084578
# 7    4678  1232    chr1  46049660  46084578
# 8    4678  1233    chr1  46049660  46084578
# 9    4678  1234    chr1  46067733  46075197
# 10   4678  1235    chr1  46077135  46084578
# 11  54891 12593    chr2 206858445 206950906
# 12   6594 77970    chrX 128580478 128657460
# 13   6594 77971    chrX 128580478 128657460
# 14   6594 77972    chrX 128580740 128657460
# 15   6594 77973    chrX 128580740 128657460

If required, we can then add geneSymbol to the table using merge: 如果需要，我们可以使用merge将geneSymbol添加到表中：

res <- merge(myGeneSymbols, myGeneSymbolsTx, by.x = "ENTREZID", by.y = "GENEID")
#    ENTREZID  SYMBOL  TXID TXCHROM   TXSTART     TXEND
# 1      4678    NASP  1229    chr1  46049660  46079853
# 2      4678    NASP  1230    chr1  46049660  46081143
# 3      4678    NASP  1231    chr1  46049660  46084578
# 4      4678    NASP  1232    chr1  46049660  46084578
# 5      4678    NASP  1233    chr1  46049660  46084578
# 6      4678    NASP  1234    chr1  46067733  46075197
# 7      4678    NASP  1235    chr1  46077135  46084578
# 8     54617   INO80 55599   chr15  41267988  41280172
# 9     54617   INO80 55600   chr15  41271079  41408340
# 10    54617   INO80 55601   chr15  41271079  41408340
# 11    54891  INO80D 12593    chr2 206858445 206950906
# 12     6594 SMARCA1 77970    chrX 128580478 128657460
# 13     6594 SMARCA1 77971    chrX 128580478 128657460
# 14     6594 SMARCA1 77972    chrX 128580740 128657460
# 15     6594 SMARCA1 77973    chrX 128580740 128657460

HGNC基因名称的基因组坐标

问题描述

2 个解决方案

解决方案1
1 2018-09-07 12:14:36

解决方案2
1 已采纳 2018-09-10 07:25:44

HGNC基因名称的基因组坐标

问题描述

2 个解决方案

解决方案1 1 2018-09-07 12:14:36

解决方案2 1 已采纳 2018-09-10 07:25:44

解决方案1
1 2018-09-07 12:14:36

解决方案2
1 已采纳 2018-09-10 07:25:44