简体   繁体   中英

ggplot mixture model R

I have a dataset with numeric values and a categorical variable. The distribution of the numeric variable differs for each category. I want to plot "density plots" for each categorical variable so that they are visually below the entire density plot.

This is similiar to components of a mixture model without calculating the mixture model (as I already know the categorical variable which splits the data).

If I take ggplot to group according to the categorical variable, each of the four densities are real densities and integrate to one.

library(ggplot2)
ggplot(iris, aes(x = Sepal.Width)) + geom_density() + geom_density(aes(x = Sepal.Width, group = Species, colour = 'Species'))

在此处输入图片说明

What I want is to have the densities of each category as a sub-density (not integrating to 1). Similiar to the following code (which I only implemented for two of the three iris species)

myIris <- as.data.table(iris)
# calculate density for entire dataset
dens_entire <- density(myIris[, Sepal.Width], cut = 0)
dens_e <- data.table(x = dens_entire[[1]], y = dens_entire[[2]])

# calculate density for dataset with setosa
dens_setosa <- density(myIris[Species == 'setosa', Sepal.Width], cut = 0)
dens_sa <- data.table(x = dens_setosa[[1]], y = dens_setosa[[2]])

# calculate density for dataset with versicolor
dens_versicolor <- density(myIris[Species == 'versicolor', Sepal.Width], cut = 0)
dens_v <- data.table(x = dens_versicolor[[1]], y = dens_versicolor[[2]])

# plot densities as mixture model
ggplot(dens_e, aes(x=x, y=y)) + geom_line() + geom_line(data = dens_sa, aes(x = x, y = y/2.5, colour = 'setosa')) + 
  geom_line(data = dens_v, aes(x = x, y = y/1.65, colour = 'versicolor'))

resulting in

在此处输入图片说明

Above I hard-coded the number to reduce the y values. Is there any way to do it with ggplot? Or to calculate it?

Thanks for your ideas.

Do you mean something like this? You need to change the scale though.

ggplot(iris, aes(x = Sepal.Width)) + 
  geom_density(aes(y = ..count..)) + 
  geom_density(aes(x = Sepal.Width, y = ..count.., 
               group = Species, colour = Species))

Another option may be

ggplot(iris, aes(x = Sepal.Width)) + 
   geom_density(aes(y = ..density..)) + 
   geom_density(aes(x = Sepal.Width, y = ..density../3, 
                    group = Species, colour = Species))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM