简体   繁体   中英

Get hgnc_symbol/gene_name from ensembl_gene_id

I have this code (come from here ):

library('biomaRt')
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- rownames(res)
G_list <- getBM(filters= "ensembl_gene_id", attributes=c("ensembl_gene_id","entrezgene", "description","hgnc_symbol"),values=genes,mart= mart)

But when I check G_list : it is empty.

I understand why:

Here some examples of my ensembl_gene_id in genes :

"ENSG00000260727.1", "ENSG00000277521.1", "ENSG00000116514.16"

If I give this ID to getBM() , it returns nothing.

However if I delete the number after the point and the point like this:

"ENSG00000260727", "ENSG00000277521", "ENSG00000116514"

I get the expected results.

Is there a way to give gene_ID with points and get the expected results?

Not an answer but a bit too long for a comment; happy to remove if deemed not appropriate.

In short, yes, you need to remove the "dot digit" part of the Ensembl gene name. The numbers denote different version numbers associated with stable Ensembl identifiers.

From the Ensembl documentation on stable IDs :

When reassigning stable identifiers between reannotation we can optionally choose to increment the version number assigned with a stable identifier. We do so to indicate an underlying change in the entity.

For genes (ie Ensembl identifiers of the form ENSG* ), the version number increments when the set of transcripts linked to a gene changes.

This post is in fact a duplicate of a post on Biostars: Question: Mapping Ensembl Gene IDs with dot suffix ; you should take a look at some of the R solutions discussed there.


Postscript

Instead of using Biomart it's often better/faster to use some of the existing annotation packages from Bioconductor . For example, take a look at

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM