简体   繁体   中英

Clustering genes based on function

We would like to use either hierarchical or k means clustering, to cluster the genes in our dataset based on their function. We got the GO id for each gene and now we would like to cluster them in groups based on the function preferably hierarchical. That means from the bottom (where each function is unique) to upper levels (where we have more generalized/groups of functions). We are programming in R.

Thanks in advance for your help!

Usuall one either performs a differential expression analysis between two conditions, or clusters genes based on expression across conditions or time points. After that, it is possible to look for overrepresentation of GO terms in differentially expressed gene sets or in clusters.

You may be interested in GeneMania ( http://www.genemania.org/ ) - you can enter a list of genes that will be presented in a network (with lots of options for customisation and expansioN). This tool will again provide you with GO terms that are enriched in the network. A second tool of interest is Gorilla ( http://cbl-gorilla.cs.technion.ac.il/ ) - this will show the GO hierarchy itself with GO terms lighting up if they are enriched.

k-means isn't a good idea for this kind of data.

Instead, look at algorithms specialized for this data, in particular biclustering algorithms .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM