简体   繁体   English

ggplot中多个组的密度图

[英]Density plot for multiple groups in ggplot

I have seen example1 and How to overlay density plots in R? 我看过example1如何在R中叠加密度图? and Overlapped density plots in ggplot2 about how to make density plot. 关于如何制作密度图的ggplot2中的重叠密度图 I can make a density plot with the codes in the second link. 我可以使用第二个链接中的代码创建密度图。 However I am wondering how can I make such a graph in ggplot or plotly ? 但是我想知道如何在ggplotplotly制作这样的图表? I have looked at all the examples but cannot figure it out for my problem. 我看了所有的例子,但无法弄清楚我的问题。 I have a toy data frame with gene expression leukemia data description , which columns in it refers to 2 groups of individuals 我有一个带有基因表达白血病数据描述的玩具数据框,其中的列指的是2组个体

leukemia_big <- read.csv("http://web.stanford.edu/~hastie/CASI_files/DATA/leukemia_big.csv")

df <- data.frame(class= ifelse(grepl("^ALL", colnames(leukemia_big),
                 fixed = FALSE), "ALL", "AML"), row.names = colnames(leukemia_big))

plot(density(as.matrix(leukemia_big[,df$class=="ALL"])), 
     lwd=2, col="red")
lines(density(as.matrix(leukemia_big[,df$class=="AML"])), 
      lwd=2, col="darkgreen")

Ggplot requires tidy formated data, also known as a long formatted dataframe. Ggplot需要整洁的格式化数据,也称为长格式化数据帧。 The following example will do it. 以下示例将执行此操作。 But be carefull, the provided dataset has an almost identical distribution of values by type of patient, thus when you plot ALL and AML type of patients, the curves overlap and you can not see the difference. 但要小心,所提供的数据集具有几乎相同的患者类型值分布,因此当您绘制ALL和AML类型的患者时,曲线重叠并且您无法看到差异。

library(tidyverse)

leukemia_big %>% 
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
ggplot(aes(x = value, fill = type)) + geom_density(alpha = 0.5)

结果与原始数据

In this second example I will add 1 unit to the value variable for all AML type of patients, to visually demonstrate the overlapping problem 在第二个例子中,我将为所有AML类型的患者的值变量添加1个单位,以直观地展示重叠问题

leukemia_big %>% 
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
mutate(value2 = if_else(condition = type == "ALL", true = value, false = value + 1)) %>% # Helps demonstrate the overlapping between both type of patients
ggplot(aes(x = value2, fill = type)) + geom_density(alpha = 0.5)`

对AML型患者的修改数据的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM