简体   繁体   English

使用biomaRt从基因列表中获取Entrez基因ID

[英]Entrez gene IDs from gene list using biomaRt

I am trying to convert a list of gene names to entrez gene IDs. 我正在尝试将基因名称列表转换为entrez基因ID。

for now i have this: 现在我有这个:

>library(biomaRt)    
>ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>mapping <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
                          'entrezgene', 'hgnc_symbol'),mart = ensembl)

This creates a table with the entrez gene IDs and names. 这将创建一个带有entrez基因ID和名称的表。 However how can I filter out the IDs based on my gene list? 但是,如何根据基因列表过滤出ID?

This is an example of the gene names list: Gene names 这是基因名称列表的一个示例: 基因名称

It is just an excel files with couple of hundred gene names in total. 它只是一个excel文件,总共有数百个基因名称。

Hopefully someone could help me! 希望有人可以帮助我!

Data 数据

Create a vector of gene names: 创建基因名称的载体:

mygenes <- c("TNF", "IL6", "IL1B", "IL10", "CRP", "TGFB1", "CXCL8")

Retrieve information from the BioMart: 从BioMart检索信息:

library(biomaRt)

hsmart <- useMart(dataset = "hsapiens_gene_ensembl", biomart = "ensembl")

hsmart

# Object of class 'Mart':
#   Using the ENSEMBL_MART_ENSEMBL BioMart database
#   Using the hsapiens_gene_ensembl dataset

Map gene names to Ensembl gene ids, transcript ids, entreze ids 将基因名称映射到Ensembl基因ID,转录本ID,Entreze ID

To do this, you don't need to convert whole database into the table of corresponding ids. 为此,您无需将整个数据库转换为相应ID的表。 Using filter = "hgns_symbol" as parameter for your getBM() call, will subset database by gene names you've provided as a values argument of getBM() function: 使用filter = "hgns_symbol"作为getBM()调用的参数,将按您作为getBM()函数的values参数提供的基因名称对数据库进行子集化:

mapping <- getBM(
  attributes = c('ensembl_gene_id', 'ensembl_transcript_id', 'entrezgene', 'hgnc_symbol'), 
  filters = 'hgnc_symbol',
  values = mygenes,
  mart = hsmart
)

Which give you 43 records for your genes: 它为您的基因提供了43条记录:

mapping %>%
  arrange(hgnc_symbol, ensembl_gene_id, ensembl_transcript_id, entrezgene)

#   ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol
#1  ENSG00000132693       ENST00000255030       1401         CRP
#2  ENSG00000132693       ENST00000368110       1401         CRP
#3  ENSG00000132693       ENST00000368111       1401         CRP
#4  ENSG00000132693       ENST00000368112       1401         CRP
#5  ENSG00000132693       ENST00000437342       1401         CRP
#
#   ............................................................
#
#39 ENSG00000228321       ENST00000412275       7124         TNF
#40 ENSG00000228849       ENST00000420425       7124         TNF
#41 ENSG00000228978       ENST00000445232       7124         TNF
#42 ENSG00000230108       ENST00000443707       7124         TNF
#43 ENSG00000232810       ENST00000449264       7124         TNF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法使用 biomaRt 包从 Entrez ID 获取基因符号 - Unable to use biomaRt package to get Gene Symbols from Entrez IDs 如何使用biomaRt将安捷伦探针ID列表转换为基因符号并具有na值? - How do I convert a list of agilent probe IDs to gene symbols using biomaRt and have na values? 如何将大列表中的Entrez ids转换为基因符号并替换R中列表中的entrez ids? - How to convert Entrez ids in a large list into gene symbols and replace entrez ids in list in R? 使用 biomaRt 将 Ensembl ID 转换为基因名称 - convert Ensembl ID to gene name using biomaRt 如何在R中将基因名称(hgnc_symbol)转换为Ensemble ID? “ bioconductor-biomaRt” - How can I convert gene names (hgnc_symbol) to Ensemble IDs in R? “bioconductor-biomaRt” R中的Biomart将rssnp转换为基因名称 - Biomart in R to convert rssnp to gene name 使用 Biomart hsapiens_gene_ensembl 数据集时的错误消息。 有谁知道怎么解决? - Error message when using Biomart hsapiens_gene_ensembl dataset. Anyone know how to solve? 基因名称重复时,如何从R中的RNAseq数据通过基因ID调用数据帧? - How do I call a data frame by Gene IDs from RNAseq data in R when gene names are duplicated? 从基因符号和ID获取基因位置 - Get gene location from gene symbol and ID 在R中提取基因注释ID - Extracting Gene Annotation IDs in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM