How do I make densities with different sizes have the same smoothness in ggplot2?

Question

I have a dataset that contains a continuous variable for which I want to display the density and a grouping variable that I want to use to split the density. When the sizes of the groups are similar, the density plot comes out fine:

library(ggplot2)
data("lalonde", package = "cobalt")
ggplot(lalonde, aes(x = educ, fill = factor(treat))) + 
   geom_density(alpha = .5)

Now, let's say my groups are of different sizes, but the same relative frequencies for each variable are present within each group. In the example below, I simply replicate the rows of one of the groups many times while keeping the other group as it was.

bigll <- do.call("rbind", c(list(lalonde), replicate(100, 
             lalonde[lalonde$treat == 0,], simplify = FALSE)))
ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
       geom_density(alpha = .5)

It appears much less smooth. Is there a way to adjust the smoothness parameters by group to so that the second plot would appear more similar to the first plot? That is, can I change the smoothness parameters to the lowest common denominator so that the densities can be visually compared more easily?

Answer 1

With the help of @Carlos and others, I found what I was looking for. It's true that the smoothness of the density should typcially refelct the size of the sample as Carlos mentioned, but in my case what I wanted is for the bandwidth of the two densities to be the same; in particular, I wanted them to be that of the smaller group. The default bandwidth in ggplot2 is bw.nrd0 ; I can use that on the smaller group and then set that as the global bandwidth for my plot.

bw <- bw.nrd0(bigll$educ[bigll$treat == 1])
ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
       geom_density(alpha = .5, bw = bw)

That definitely obscures some of the detail in the larger distribution, but for my purposes this was sufficient.

Answer 2

"Smoothness" is not a parameter, is the result of the estimated bandwidth. You can use adjust to change bandwidth by a multiplier, so increasing the smoothness of both groups:

ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
  geom_density(alpha = .5, adjust = 2)

Following that logic, you can plot each group separately and apply a different multiplier for each one:

ggplot() + 
  geom_density(
    aes(x = educ),
    data = subset(bigll, treat == 0),
    fill = '#EB675F', alpha = .5,
    adjust = 3) +
  geom_density(
    aes(x = educ),
    data = subset(bigll, treat == 1),
    fill = '#35C1C4', alpha = .5,
    adjust = 1.5)

This is a simplistic solution. Check this post for suggestions on how to use a better function to calculate values for each group: Understanding bandwidth smoothing in ggplot2

But be cautious when doing this when analyzing your data. The greater roughness when you multiply one of the groups is a correct reflection on the change you've made. A group of data formed by (2,4,6) is not the same thing as (2,2,2,2,4,4,4,4,6,6,6,6). In the first case, there is a good chance of having intermediate values that were not sampled. In the second, there is a high chance that the data occurs in intervals.

How do I make densities with different sizes have the same smoothness in ggplot2?

Question

2 answers

solution1
1 ACCPTED 2018-07-14 21:23:07

solution2
0 2018-07-14 18:18:15

How do I make densities with different sizes have the same smoothness in ggplot2?

Question

2 answers

solution1 1 ACCPTED 2018-07-14 21:23:07

solution2 0 2018-07-14 18:18:15

solution1
1 ACCPTED 2018-07-14 21:23:07

solution2
0 2018-07-14 18:18:15