[英]How do I convert a list of agilent probe IDs to gene symbols using biomaRt and have na values?
I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. 我正在尝试使用biomaRt将超过90k探针ID的列表转换为基因符号,但是遇到了问题。 Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. 使用getBM函数,我可以看到只有22k个具有相应的基因符号,但是输出是长度为22k的向量,而且我看不到与初始探针ID列表的对应关系。 Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists. 使用getBMlist,我可以得到不匹配的探针指定na值的输出,但是该函数会给出警告消息,表明getBMlist不适用于大列表。 How do I get an output of 90k gene symbols and na values? 如何获得90k个基因符号和na值的输出?
To get the mappings between probeID and gene symbol you need to include the probeID in the biomaRt attributes. 要获取probeID和基因符号之间的映射,您需要在biomaRt属性中包括probeID。
Here's how I did it for some of my work using agilent microarrays: 这是我使用安捷伦微阵列完成某些工作的方式:
genes<-c("A_23_P10060", "A_23_P10091", "A_23_P103951", "A_23_P10525", "A_23_P105732", "A_23_P10605", "NM_005325")
library(biomaRt)
ensembl<-useMart("ensembl", dataset="hsapiens_gene_ensembl")
ensembl.id<-grep("ENST", genes, value=T)
agilent.df<-getBM(attributes = c("hgnc_symbol","efg_agilent_wholegenome_4x44k_v1"), filters=c("efg_agilent_wholegenome_4x44k_v1"),values=genes, mart=ensembl)
genes<-merge(x = as.data.frame(genes),y = agilent.df, by.y="efg_agilent_wholegenome_4x44k_v1", all.x=T, by.x="genes")
There is a very good biomaRt tutorial that walks you though the same process. 有一个非常好的生物材料教程 ,可以指导您完成相同的过程。 If you run this code you'll notice that one probe will have "" for a hgnc_symbol, that's because it exists in the ensemble mart but has no designated gene symbol. 如果运行此代码,您会注意到一个探针将为hgnc_symbol带有“”,这是因为它存在于集成市场中,但没有指定的基因符号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.