从ensembl_gene_id获取hgnc_symbol / gene_name

Question

I have this code (come from here ): 我有以下代码（来自此处）：

library('biomaRt')
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- rownames(res)
G_list <- getBM(filters= "ensembl_gene_id", attributes=c("ensembl_gene_id","entrezgene", "description","hgnc_symbol"),values=genes,mart= mart)

But when I check G_list : it is empty. 但是当我检查G_list时 ： 它是空的。

I understand why: 我了解原因：

Here some examples of my ensembl_gene_id in genes : 这里是我的ensembl_gene_id 基因的一些示例：

"ENSG00000260727.1", "ENSG00000277521.1", "ENSG00000116514.16"

If I give this ID to getBM() , it returns nothing. 如果我将此ID赋予getBM（） ，则它不返回任何内容。

However if I delete the number after the point and the point like this: 但是，如果我删除像这样的点和点之后的数字：

"ENSG00000260727", "ENSG00000277521", "ENSG00000116514"

I get the expected results. 我得到了预期的结果。

Is there a way to give gene_ID with points and get the expected results? 有没有一种方法可以给gene_ID加分并获得预期结果？

Answer 1

Not an answer but a bit too long for a comment; 不是答案，而是评论时间太长； happy to remove if deemed not appropriate. 如果认为不合适，请乐意删除。

In short, yes, you need to remove the "dot digit" part of the Ensembl gene name. 简而言之，是的，您需要删除Ensembl基因名称的“点数字”部分。 The numbers denote different version numbers associated with stable Ensembl identifiers. 这些数字表示与稳定的Ensembl标识符关联的不同版本号。

From the Ensembl documentation on stable IDs : 从Ensembl文档中获得有关稳定ID的信息：

When reassigning stable identifiers between reannotation we can optionally choose to increment the version number assigned with a stable identifier. 在重新注释之间重新分配稳定标识符时，我们可以选择增加分配给稳定标识符的版本号。 We do so to indicate an underlying change in the entity. 我们这样做是为了指示实体中的潜在变化。

For genes (ie Ensembl identifiers of the form ENSG* ), the version number increments when the set of transcripts linked to a gene changes. 对于基因（即ENSG*形式的Ensembl标识符），当与基因链接的一组转录本发生变化时，版本号会增加。

This post is in fact a duplicate of a post on Biostars: Question: Mapping Ensembl Gene IDs with dot suffix ; 实际上，该帖子是有关Biostars的帖子的重复：问题：使用点后缀映射Ensembl基因ID ； you should take a look at some of the R solutions discussed there. 您应该看看其中讨论过的一些R解决方案。

Postscript 后记

Instead of using Biomart it's often better/faster to use some of the existing annotation packages from Bioconductor . 与其使用Biomart，不如使用Bioconductor中的某些现有注释包，通常更好/更快。 For example, take a look at 例如看一下

the Ensembl based annotation package EnsDb.Hsapiens.v86 maintained by Johannes Rainer Johannes Rainer维护的基于Ensembl的注释包EnsDb.Hsapiens.v86
the primarily Entrez gene based genome wide annotation package org.Hs.eg.db 主要基于Entrez基因的全基因组注释包org.Hs.eg.db
the functionality of bitr by Guangchuang Yu. 广bitr宇的bitr功能。 It used to be an independent package but got absorbed into clusterProfiler by the same author, and provides a "universal biological ID translator" function. 它曾经是一个独立的程序包，但被同一作者吸收到clusterProfiler ，并提供了“通用生物ID转换器”功能。

从ensembl_gene_id获取hgnc_symbol / gene_name

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-03-15 11:54:38

Postscript 后记

从ensembl_gene_id获取hgnc_symbol / gene_name

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-03-15 11:54:38

Postscript 后记

解决方案1
2 已采纳 2019-03-15 11:54:38