I have a dataset that contains a continuous variable for which I want to display the density and a grouping variable that I want to use to split the density. When the sizes of the groups are similar, the density plot comes out fine:
library(ggplot2)
data("lalonde", package = "cobalt")
ggplot(lalonde, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5)
Now, let's say my groups are of different sizes, but the same relative frequencies for each variable are present within each group. In the example below, I simply replicate the rows of one of the groups many times while keeping the other group as it was.
bigll <- do.call("rbind", c(list(lalonde), replicate(100,
lalonde[lalonde$treat == 0,], simplify = FALSE)))
ggplot(bigll, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5)
It appears much less smooth. Is there a way to adjust the smoothness parameters by group to so that the second plot would appear more similar to the first plot? That is, can I change the smoothness parameters to the lowest common denominator so that the densities can be visually compared more easily?
With the help of @Carlos and others, I found what I was looking for. It's true that the smoothness of the density should typcially refelct the size of the sample as Carlos mentioned, but in my case what I wanted is for the bandwidth of the two densities to be the same; in particular, I wanted them to be that of the smaller group. The default bandwidth in ggplot2 is bw.nrd0
; I can use that on the smaller group and then set that as the global bandwidth for my plot.
bw <- bw.nrd0(bigll$educ[bigll$treat == 1])
ggplot(bigll, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5, bw = bw)
That definitely obscures some of the detail in the larger distribution, but for my purposes this was sufficient.
"Smoothness" is not a parameter, is the result of the estimated bandwidth. You can use adjust
to change bandwidth by a multiplier, so increasing the smoothness of both groups:
ggplot(bigll, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5, adjust = 2)
Following that logic, you can plot each group separately and apply a different multiplier for each one:
ggplot() +
geom_density(
aes(x = educ),
data = subset(bigll, treat == 0),
fill = '#EB675F', alpha = .5,
adjust = 3) +
geom_density(
aes(x = educ),
data = subset(bigll, treat == 1),
fill = '#35C1C4', alpha = .5,
adjust = 1.5)
This is a simplistic solution. Check this post for suggestions on how to use a better function to calculate values for each group: Understanding bandwidth smoothing in ggplot2
But be cautious when doing this when analyzing your data. The greater roughness when you multiply one of the groups is a correct reflection on the change you've made. A group of data formed by (2,4,6) is not the same thing as (2,2,2,2,4,4,4,4,6,6,6,6). In the first case, there is a good chance of having intermediate values that were not sampled. In the second, there is a high chance that the data occurs in intervals.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.