简体   繁体   English

ggplot混合模型R

[英]ggplot mixture model R

I have a dataset with numeric values and a categorical variable.我有一个包含数值和分类变量的数据集。 The distribution of the numeric variable differs for each category.每个类别的数值变量的分布都不同。 I want to plot "density plots" for each categorical variable so that they are visually below the entire density plot.我想为每个分类变量绘制“密度图”,以便它们在视觉上低于整个密度图。

This is similiar to components of a mixture model without calculating the mixture model (as I already know the categorical variable which splits the data).这类似于没有计算混合模型的混合模型的组件(因为我已经知道分割数据的分类变量)。

If I take ggplot to group according to the categorical variable, each of the four densities are real densities and integrate to one.如果我根据分类变量将 ggplot 分组,则四个密度中的每一个都是真实密度并集成为一个。

library(ggplot2)
ggplot(iris, aes(x = Sepal.Width)) + geom_density() + geom_density(aes(x = Sepal.Width, group = Species, colour = 'Species'))

在此处输入图片说明

What I want is to have the densities of each category as a sub-density (not integrating to 1).我想要的是将每个类别的密度作为子密度(不整合为 1)。 Similiar to the following code (which I only implemented for two of the three iris species)类似于下面的代码(我只为三种鸢尾中的两种实现了)

myIris <- as.data.table(iris)
# calculate density for entire dataset
dens_entire <- density(myIris[, Sepal.Width], cut = 0)
dens_e <- data.table(x = dens_entire[[1]], y = dens_entire[[2]])

# calculate density for dataset with setosa
dens_setosa <- density(myIris[Species == 'setosa', Sepal.Width], cut = 0)
dens_sa <- data.table(x = dens_setosa[[1]], y = dens_setosa[[2]])

# calculate density for dataset with versicolor
dens_versicolor <- density(myIris[Species == 'versicolor', Sepal.Width], cut = 0)
dens_v <- data.table(x = dens_versicolor[[1]], y = dens_versicolor[[2]])

# plot densities as mixture model
ggplot(dens_e, aes(x=x, y=y)) + geom_line() + geom_line(data = dens_sa, aes(x = x, y = y/2.5, colour = 'setosa')) + 
  geom_line(data = dens_v, aes(x = x, y = y/1.65, colour = 'versicolor'))

resulting in导致

在此处输入图片说明

Above I hard-coded the number to reduce the y values.上面我对数字进行了硬编码以减少 y 值。 Is there any way to do it with ggplot?有没有办法用 ggplot 做到这一点? Or to calculate it?还是去计算?

Thanks for your ideas.谢谢你的想法。

Do you mean something like this?你的意思是这样的吗? You need to change the scale though.不过,您需要更改比例。

ggplot(iris, aes(x = Sepal.Width)) + 
  geom_density(aes(y = ..count..)) + 
  geom_density(aes(x = Sepal.Width, y = ..count.., 
               group = Species, colour = Species))

Another option may be另一种选择可能是

ggplot(iris, aes(x = Sepal.Width)) + 
   geom_density(aes(y = ..density..)) + 
   geom_density(aes(x = Sepal.Width, y = ..density../3, 
                    group = Species, colour = Species))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM