Plotting two overlapping density curves using ggplot

Question

I have a dataframe in R consisting of 104 columns, appearing as so:

   id         vcr1       vcr2         vcr3  sim_vcr1  sim_vcr2  sim_vcr3  sim_vcr4  sim_vcr5  sim_vcr6  sim_vcr7
1 2913 -4.782992840  1.7631999  0.003768704  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
2 1260  0.003768704  3.1577108 -0.758378208  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
3 2912 -4.782992840  1.7631999  0.003768704  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
4 2914 -1.311132669  0.8220594  2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
5 2915 -1.311132669  0.8220594  2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
6 1261  2.372950077 -0.7022792 -4.951318264 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574

The "sim_vcr*" variables go all the way through sim_vcr100

I need two overlapping density density curves contained within one plot, looking something like this (except here you see 5 instead of 2):

I need one of the density curves to consist of all values contained in columns vcr1, vcr2, and vcr3, and I need another density curve containing all values in all of the sim_vcr* columns (so 100 columns, sim_vcr1-sim_vcr100)

Because the two curves overlap, they need to be transparent, like in the attached image. I know that there is a pretty straightforward way to do this using the ggplot command, but I am having trouble with the syntax, as well as getting my data frame oriented correctly so that each histogram pulls from the proper columns.

Any help is much appreciated.

Answer 1

With df being the data you mentioned in your post, you can try this:

Separate dataframes with next code, then plot:

library(tidyverse)
library(gdata)
#Index
i1 <- which(startsWith(names(df),pattern = 'vcr'))
i2 <- which(startsWith(names(df),pattern = 'sim'))
#Isolate
df1 <- df[,c(1,i1)]
df2 <- df[,c(1,i2)]
#Melt
M1 <- pivot_longer(df1,cols = names(df1)[-1])
M2 <- pivot_longer(df2,cols = names(df2)[-1])
#Plot 1
ggplot(M1) + geom_density(aes(x=value,fill=name), alpha=.5)
#Plot 2
ggplot(M2) + geom_density(aes(x=value,fill=name), alpha=.5)

Update

Use next code for one plot:

#Unique plot
#Melt
M <- pivot_longer(df,cols = names(df)[-1])
#Mutate
M$var <- ifelse(startsWith(M$name,'vcr',),'vcr','sim_vcr')
#Plot 3
ggplot(M) + geom_density(aes(x=value,fill=var), alpha=.5)

Answer 2

Using the dplyr package, first you can convert your data to long format using the function pivot_longer as follows:

df %<>% pivot_longer(cols = c(starts_with('vcr'), starts_with('sim_vcr')),
                         names_to = c('type'),
                         values_to = c('values'))

After using filter function you can create separate plots for each value type For vcr columns:

df %>% 
  filter(str_detect(type, '^vcr')) %>%
  ggplot(.) +
  geom_density(aes(x = values, fill = type), alpha = 0.5)

The above produces the following plot: for sim_vcr columns:

df %>%
  filter(str_detect(type, '^sim_vcr')) %>%
  ggplot(.) +
  geom_density(aes(x = values, fill = type), alpha = 0.5)

The above code produces the following plot:

Answer 3

Another simple way to subset and prepare your data for ggplot is with gather() from tidyr which you can read more about. Heres how I do it. df being your data frame provided.

# Load tidyr to use gather()
library(tidyr)

#Split appart the data you dont want on their own, the first three columns, and gather them
df_vcr <- gather(data = df[,2:4])

#Gather the other columns in the dataframe
df_sim<- gather(data = df[,-c(1:4)])

#Plot the first
ggplot() + 
  geom_density(data = df_vcr, 
               mapping = aes(value, group = key, color = key, fill = key),
               alpha = 0.5)
#Plot the second
ggplot() + 
  geom_density(data = df_sim,
               mapping = aes(value, group = key, color = key, fill = key),
               alpha = 0.5)

However I am a little unclear on what you mean by "all values in all of the sim_vcr* columns". Perhaps you want all of those values in one density curve? To do this, simply do not give ggplot any grouping info in the second case.

ggplot() + geom_density(data = df_sim,
           mapping = aes(value),
           fill = "grey50",
           alpha = 0.5)

Notice here I can still specify the 'fill' for the curve outside of the aes() function and it will apply it too all curves instead of give each group specified in 'key' a different color.

Plotting two overlapping density curves using ggplot

Question

3 answers

solution1
0 ACCPTED 2020-07-17 20:40:58

solution2
0 2020-07-17 20:57:03

solution3
0 2020-07-17 21:06:28

Plotting two overlapping density curves using ggplot

Question

3 answers

solution1 0 ACCPTED 2020-07-17 20:40:58

solution2 0 2020-07-17 20:57:03

solution3 0 2020-07-17 21:06:28

solution1
0 ACCPTED 2020-07-17 20:40:58

solution2
0 2020-07-17 20:57:03

solution3
0 2020-07-17 21:06:28