简体   繁体   English

R中的聚类和热图

[英]Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. 我是R的新手,我正在尝试在数据表上进行一些聚类,其中行表示单个对象,列表示已为这些对象测量的特征。 I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. 我已经完成了一些聚类教程,但我得到了一些输出,但是,我在聚类后获得的热图与根据同一数据表与另一个程序生成的热图完全不对应。 While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (ie, colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). 虽然该程序的热图确实表明了对象之间标记表达的明显差异,但我的热图并没有显示出太大的差异,我无法识别热图上的任何聚类(即颜色)图案,它似乎是一个随机混乱的集合颜色彼此接近(没有大的对比度)。 Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong. 这是我正在使用的代码的一个例子,也许有人知道我可能做错了什么。

mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)

I am using log values for the clustering because I know that the other programme does so, too 我正在使用聚类的日志值,因为我知道其他程序也这样做

library(gplots)

hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
    col=colorpanel(40, "black","yellow","green"),
    scale="column", RowSideColors=mycol) 

Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does. 再次,我绘制原始颜色,但使用日志集群,因为我知道这是其他程序的作用。

I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. 我尝试使用这些方法,但我没有得到任何至少在某种程度上看起来像聚集热图的东西。 When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). 当我取出缩放时,热图变得非常暗(我实际上确信我已经以某种方式按列扩展或规范化数据)。 I also tried to cluster with k-means, but again, this didn't help. 我也尝试用k-means进行聚类,但同样,这没有帮助。 My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters. 我的想法是,由于两个异常值,颜色标度可能无法完全使用,但是尽管去除它们会略微增加热图上绘制的颜色范围,但这仍然没有显示出正确的聚类。

Is there anything else I could play around with? 还有什么我可以玩的吗?

And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? 是否可以使用热图更改色阶,以便在最后一个具有“一切都大于特定值”范围的仓中找到异常值? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function. 我尝试用heatmap.2(参数“break”)来做这个,但是我没有成功,而且我也没有设法将我使用的行侧颜色与热图功能放在一起。

If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap. 如果您可以使用gplots包中的heatmap.2,它将允许您添加中断以将颜色指定给热图中表示的范围。
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this: 例如,如果您有3种颜色蓝色,白色和红色,值从低到高,您可以执行以下操作:

my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)

In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data. 在这种情况下,您有3组与3种颜色相对应的值,这些值当然会有所不同,具体取决于您对数据的值。

One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states: Defaults to hclust. 你在程序中做的一件事就是在你的数据上调用hclust,然后在其上调用热图,但是如果你查看热图手册页,它会说:默认为hclust。 So I don't think you need to do that. 所以我认为你不需要这样做。 You might want to take a look at some similar questions that I had asked that might help to point you in the right direction: 您可能想看看我提出的一些类似的问题,这些问题可能有助于您指出正确的方向:

Heatmap Question 1 热图问题1

Heatmap Question 2 热图问题2

If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more. 如果您发布热图的图像,并且其他程序正在制作热图的图像,我们将更容易为您提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM