简体   繁体   中英

How do I make densities with different sizes have the same smoothness in ggplot2?

I have a dataset that contains a continuous variable for which I want to display the density and a grouping variable that I want to use to split the density. When the sizes of the groups are similar, the density plot comes out fine:

library(ggplot2)
data("lalonde", package = "cobalt")
ggplot(lalonde, aes(x = educ, fill = factor(treat))) + 
   geom_density(alpha = .5)

在此处输入图片说明

Now, let's say my groups are of different sizes, but the same relative frequencies for each variable are present within each group. In the example below, I simply replicate the rows of one of the groups many times while keeping the other group as it was.

bigll <- do.call("rbind", c(list(lalonde), replicate(100, 
             lalonde[lalonde$treat == 0,], simplify = FALSE)))
ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
       geom_density(alpha = .5)

在此处输入图片说明

It appears much less smooth. Is there a way to adjust the smoothness parameters by group to so that the second plot would appear more similar to the first plot? That is, can I change the smoothness parameters to the lowest common denominator so that the densities can be visually compared more easily?

With the help of @Carlos and others, I found what I was looking for. It's true that the smoothness of the density should typcially refelct the size of the sample as Carlos mentioned, but in my case what I wanted is for the bandwidth of the two densities to be the same; in particular, I wanted them to be that of the smaller group. The default bandwidth in ggplot2 is bw.nrd0 ; I can use that on the smaller group and then set that as the global bandwidth for my plot.

bw <- bw.nrd0(bigll$educ[bigll$treat == 1])
ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
       geom_density(alpha = .5, bw = bw)

在此处输入图片说明

That definitely obscures some of the detail in the larger distribution, but for my purposes this was sufficient.

"Smoothness" is not a parameter, is the result of the estimated bandwidth. You can use adjust to change bandwidth by a multiplier, so increasing the smoothness of both groups:

ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
  geom_density(alpha = .5, adjust = 2)

在此处输入图片说明

Following that logic, you can plot each group separately and apply a different multiplier for each one:

ggplot() + 
  geom_density(
    aes(x = educ),
    data = subset(bigll, treat == 0),
    fill = '#EB675F', alpha = .5,
    adjust = 3) +
  geom_density(
    aes(x = educ),
    data = subset(bigll, treat == 1),
    fill = '#35C1C4', alpha = .5,
    adjust = 1.5)

在此处输入图片说明

This is a simplistic solution. Check this post for suggestions on how to use a better function to calculate values for each group: Understanding bandwidth smoothing in ggplot2

But be cautious when doing this when analyzing your data. The greater roughness when you multiply one of the groups is a correct reflection on the change you've made. A group of data formed by (2,4,6) is not the same thing as (2,2,2,2,4,4,4,4,6,6,6,6). In the first case, there is a good chance of having intermediate values that were not sampled. In the second, there is a high chance that the data occurs in intervals.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM