简体   繁体   English

颜色密度通过簇组绘制在ggplot2中

[英]Colour Density plots in ggplot2 by cluster groups

I have been clustering some of my values and then grouping them. 我一直在聚集我的一些价值观,然后将它们分组。 I then plot some density plots using ggplot2 and overlay the clusters. 然后,我使用ggplot2绘制一些密度图并覆盖簇。 An example image is below: 示例图片如下: 在此输入图像描述

For each group in the cluster I plot a density plot and overlay them. 对于群集中的每个组,我绘制密度图并覆盖它们。 The colouring in the density plots corresponds to the groupings in the clustering. 密度图中的着色对应于聚类中的分组。

My problem is, I have split the data manually based on the groupings and put them in their own individual text table (see code below). 我的问题是,我已根据分组手动拆分数据并将它们放在各自的文本表中(参见下面的代码)。 This is very inefficient and can be come very tedious for large data sets. 这是非常低效的,并且对于大型数据集来说可能非常繁琐。 How can I dynamically plot the density plots in ggplot2 without separating the clusters into their own individual text tables? 如何在不将聚类分离到各自的文本表中的情况下动态绘制ggplot2的密度图?

The original input tables looks like this before it was split: 原始输入表在拆分之前看起来像这样:

scores <- read.table(textConnection("
file        max        min        avg               lowest
132         5112.0     6520.0     5728.0            5699.0
133         4720.0     6064.0     5299.0            5277.0
5           4617.0     5936.0     5185.0            5165.0
1           4384.0     5613.0     4917.0            4895.0
1010        5008.0     6291.0     5591.0            5545.0
104         4329.0     5554.0     4858.0            4838.0
105         4636.0     5905.0     5193.0            5165.0
35          4304.0     5578.0     4842.0            4831.0
36          4360.0     5580.0     4891.0            4867.0
37          4444.0     5663.0     4979.0            4952.0
31          4328.0     5559.0     4858.0            4839.0
39          4486.0     5736.0     5031.0            5006.0
32          4334.0     5558.0     4864.0            4843.0
"), header=TRUE)

The code I used to generate the plot: Please note combining the base graphics with grid is still not working correctly 我用来生成绘图的代码:请注意将基本图形与网格结合仍然无法正常工作

library(ggplot2)
library(grid)

layout(matrix(c(1,2,3,1,4,5), 2, 3, byrow = TRUE))

# define function to create multi-plot setup (nrow, ncol)
vp.setup <- function(x,y){
grid.newpage()
pushViewport(viewport(layout = grid.layout(x,y)))
}

# define function to easily access layout (row, col)
vp.layout <- function(x,y){
viewport(layout.pos.row=x, layout.pos.col=y)
}

vp.setup(2,3)

file_vals <- read.table(textConnection("
file        avg_vals
133         1.5923
132         1.6351
1010        1.6532
104         1.6824
105         1.6087
39          1.8694
32          1.9934
31          1.9919
37          1.8638
36          1.9691
35          1.9802
1           1.7283
5           1.7637
"), header=TRUE)

red <- read.table(textConnection("
file        max        min        avg               lowest
31          4328.0     5559.0     4858.0            4839.0
32          4334.0     5558.0     4864.0            4843.0
36          4360.0     5580.0     4891.0            4867.0
35          4304.0     5578.0     4842.0            4831.0
"), header=TRUE)

blue <- read.table(textConnection("
file        max        min        avg               lowest
133         4720.0     6064.0     5299.0            5277.0
105         4636.0     5905.0     5193.0            5165.0
104         4329.0     5554.0     4858.0            4838.0
132         5112.0     6520.0     5728.0            5699.0
1010        5008.0     6291.0     5591.0            5545.0
"), header=TRUE)

green <- read.table(textConnection("
file        max        min        avg               lowest
39          4486.0     5736.0     5031.0            5006.0
37          4444.0     5663.0     4979.0            4952.0
5           4617.0     5936.0     5185.0            5165.0
1           4384.0     5613.0     4917.0            4895.0
"), header=TRUE)


# Perform Cluster
d <- dist(file_vals$avg_vals, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit, labels=file_vals$file)
groups <- cutree(fit, k=3)

cols = c('red', 'blue', 'green', 'purple', 'orange', 'magenta', 'brown', 'chartreuse4','darkgray','cyan1')
rect.hclust(fit, k=3, border=cols)


# Desnity plots
dat = rbind(data.frame(Cluster='Red', max_vals = red$max), data.frame(Cluster='Blue', max_vals = blue$max), data.frame(Cluster='Green', max_vals = green$max))
max = (ggplot(dat,aes(x=max_vals)))
max = max + geom_density(aes(fill=factor(Cluster)), alpha=.3) + xlim(c(3500, 5500)) + scale_fill_manual(values=c("red",'blue',"green"))
max = max + labs(fill = 'Clusters')
print(max, vp=vp.layout(1,2))

dat = rbind(data.frame(Cluster='Red', min_vals = red$min), data.frame(Cluster='Blue', min_vals = blue$min), data.frame(Cluster='Green', min_vals = green$min))
min = (ggplot(dat,aes(x=min_vals)))
min = min + geom_density(aes(fill=factor(Cluster)), alpha=.3) + xlim(c(5000, 7000)) + scale_fill_manual(values=c("red",'blue',"green"))
min = min + labs(fill = 'Clusters')
print(min, vp=vp.layout(1,3))

dat = rbind(data.frame(Cluster='Red', avg_vals = red$avg), data.frame(Cluster='Blue', avg_vals = blue$avg), data.frame(Cluster='Green', avg_vals = green$avg))
avg = (ggplot(dat,aes(x=avg_vals)))
avg = avg + geom_density(aes(fill=factor(Cluster)), alpha=.3) + xlim(c(4000, 6000)) + scale_fill_manual(values=c("red",'blue',"green"))
avg = avg + labs(fill = 'Clusters')
print(avg, vp=vp.layout(2,2))

dat = rbind(data.frame(Cluster='Red', lowest_vals = red$lowest), data.frame(Cluster='Blue', lowest_vals = blue$lowest), data.frame(Cluster='Green', lowest_vals = green$lowest))
lowest = (ggplot(dat,aes(x=lowest_vals)))
lowest = lowest + geom_density(aes(fill=factor(Cluster)), alpha=.3) + xlim(c(4000, 6000)) + scale_fill_manual(values=c("red",'blue',"green"))
lowest = lowest + labs(fill = 'Clusters')
print(lowest, vp=vp.layout(2,3))

In this way you can automatically create your desired plot with 4 panels. 通过这种方式,您可以使用4个面板自动创建所需的绘图。

First, the data: 一,数据:

scores <- read.table(textConnection("
file        max        min        avg               lowest
132         5112.0     6520.0     5728.0            5699.0
133         4720.0     6064.0     5299.0            5277.0
5           4617.0     5936.0     5185.0            5165.0
1           4384.0     5613.0     4917.0            4895.0
1010        5008.0     6291.0     5591.0            5545.0
104         4329.0     5554.0     4858.0            4838.0
105         4636.0     5905.0     5193.0            5165.0
35          4304.0     5578.0     4842.0            4831.0
36          4360.0     5580.0     4891.0            4867.0
37          4444.0     5663.0     4979.0            4952.0
31          4328.0     5559.0     4858.0            4839.0
39          4486.0     5736.0     5031.0            5006.0
32          4334.0     5558.0     4864.0            4843.0
"), header=TRUE)

file_vals <- read.table(textConnection("
file        avg_vals
                                   133         1.5923
                                   132         1.6351
                                   1010        1.6532
                                   104         1.6824
                                   105         1.6087
                                   39          1.8694
                                   32          1.9934
                                   31          1.9919
                                   37          1.8638
                                   36          1.9691
                                   35          1.9802
                                   1           1.7283
                                   5           1.7637
                                   "), header=TRUE)

Both data frames can be merged into a single one: 两个数据框都可以合并为一个:

dat <- merge(scores, file_vals, by = "file")

Fit: 适合:

d <- dist(dat$avg_vals, method = "euclidean")
fit <- hclust(d, method="ward")
groups <- cutree(fit, k=3)
cols <- c('red', 'blue', 'green', 'purple', 'orange', 'magenta', 'brown', 'chartreuse4','darkgray','cyan1')

Add a column with the colour names (based on the fit): 添加带有颜色名称的列(基于拟合):

dat$group <- cols[groups]

Reshape data from wide to long format: 重塑从宽到长格式的数据:

dat_re <- reshape(dat, varying = c("max", "min", "avg", "lowest"), direction = "long", drop = c("file", "avg_vals"), v.names = "value", idvar = "group", times = c("max", "min", "avg", "lowest"), new.row.names = seq(nrow(scores) * 4))

Plot: 情节:

p <- (ggplot(dat_re ,aes(x = value))) +
geom_density(aes(fill = group), alpha=.3) +
scale_fill_manual(values=cols) +
labs(fill = 'Clusters') +
facet_wrap( ~ time)

print(p)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM