简体   繁体   中英

How do I convert a list of agilent probe IDs to gene symbols using biomaRt and have na values?

I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists. How do I get an output of 90k gene symbols and na values?

To get the mappings between probeID and gene symbol you need to include the probeID in the biomaRt attributes.

Here's how I did it for some of my work using agilent microarrays:

genes<-c("A_23_P10060", "A_23_P10091", "A_23_P103951", "A_23_P10525", "A_23_P105732", "A_23_P10605", "NM_005325")

library(biomaRt)
ensembl<-useMart("ensembl", dataset="hsapiens_gene_ensembl")

ensembl.id<-grep("ENST", genes, value=T)
agilent.df<-getBM(attributes = c("hgnc_symbol","efg_agilent_wholegenome_4x44k_v1"), filters=c("efg_agilent_wholegenome_4x44k_v1"),values=genes, mart=ensembl)

genes<-merge(x = as.data.frame(genes),y =  agilent.df, by.y="efg_agilent_wholegenome_4x44k_v1", all.x=T, by.x="genes")

There is a very good biomaRt tutorial that walks you though the same process. If you run this code you'll notice that one probe will have "" for a hgnc_symbol, that's because it exists in the ensemble mart but has no designated gene symbol.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM