简体   繁体   中英

Plotting two overlapping density curves using ggplot

I have a dataframe in R consisting of 104 columns, appearing as so:

   id         vcr1       vcr2         vcr3  sim_vcr1  sim_vcr2  sim_vcr3  sim_vcr4  sim_vcr5  sim_vcr6  sim_vcr7
1 2913 -4.782992840  1.7631999  0.003768704  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
2 1260  0.003768704  3.1577108 -0.758378208  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
3 2912 -4.782992840  1.7631999  0.003768704  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
4 2914 -1.311132669  0.8220594  2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
5 2915 -1.311132669  0.8220594  2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
6 1261  2.372950077 -0.7022792 -4.951318264 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574

The "sim_vcr*" variables go all the way through sim_vcr100

I need two overlapping density density curves contained within one plot, looking something like this (except here you see 5 instead of 2):

在此处输入图像描述

I need one of the density curves to consist of all values contained in columns vcr1, vcr2, and vcr3, and I need another density curve containing all values in all of the sim_vcr* columns (so 100 columns, sim_vcr1-sim_vcr100)

Because the two curves overlap, they need to be transparent, like in the attached image. I know that there is a pretty straightforward way to do this using the ggplot command, but I am having trouble with the syntax, as well as getting my data frame oriented correctly so that each histogram pulls from the proper columns.

Any help is much appreciated.

With df being the data you mentioned in your post, you can try this:

Separate dataframes with next code, then plot:

library(tidyverse)
library(gdata)
#Index
i1 <- which(startsWith(names(df),pattern = 'vcr'))
i2 <- which(startsWith(names(df),pattern = 'sim'))
#Isolate
df1 <- df[,c(1,i1)]
df2 <- df[,c(1,i2)]
#Melt
M1 <- pivot_longer(df1,cols = names(df1)[-1])
M2 <- pivot_longer(df2,cols = names(df2)[-1])
#Plot 1
ggplot(M1) + geom_density(aes(x=value,fill=name), alpha=.5)
#Plot 2
ggplot(M2) + geom_density(aes(x=value,fill=name), alpha=.5)

在此处输入图像描述

在此处输入图像描述

Update

Use next code for one plot:

#Unique plot
#Melt
M <- pivot_longer(df,cols = names(df)[-1])
#Mutate
M$var <- ifelse(startsWith(M$name,'vcr',),'vcr','sim_vcr')
#Plot 3
ggplot(M) + geom_density(aes(x=value,fill=var), alpha=.5)

在此处输入图像描述

Using the dplyr package, first you can convert your data to long format using the function pivot_longer as follows:

df %<>% pivot_longer(cols = c(starts_with('vcr'), starts_with('sim_vcr')),
                         names_to = c('type'),
                         values_to = c('values'))

After using filter function you can create separate plots for each value type For vcr columns:

df %>% 
  filter(str_detect(type, '^vcr')) %>%
  ggplot(.) +
  geom_density(aes(x = values, fill = type), alpha = 0.5)

The above produces the following plot: 在此处输入图像描述 for sim_vcr columns:

df %>%
  filter(str_detect(type, '^sim_vcr')) %>%
  ggplot(.) +
  geom_density(aes(x = values, fill = type), alpha = 0.5)

The above code produces the following plot: 在此处输入图像描述

Another simple way to subset and prepare your data for ggplot is with gather() from tidyr which you can read more about. Heres how I do it. df being your data frame provided.

# Load tidyr to use gather()
library(tidyr)

#Split appart the data you dont want on their own, the first three columns, and gather them
df_vcr <- gather(data = df[,2:4])

#Gather the other columns in the dataframe
df_sim<- gather(data = df[,-c(1:4)])

#Plot the first
ggplot() + 
  geom_density(data = df_vcr, 
               mapping = aes(value, group = key, color = key, fill = key),
               alpha = 0.5)
#Plot the second
ggplot() + 
  geom_density(data = df_sim,
               mapping = aes(value, group = key, color = key, fill = key),
               alpha = 0.5) 

在此处输入图像描述

在此处输入图像描述

However I am a little unclear on what you mean by "all values in all of the sim_vcr* columns". Perhaps you want all of those values in one density curve? To do this, simply do not give ggplot any grouping info in the second case.

ggplot() + geom_density(data = df_sim,
           mapping = aes(value),
           fill = "grey50",
           alpha = 0.5)

在此处输入图像描述

Notice here I can still specify the 'fill' for the curve outside of the aes() function and it will apply it too all curves instead of give each group specified in 'key' a different color.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM