简体   繁体   English

在 ggplot2 中创建密度直方图?

[英]Creating a density histogram in ggplot2?

I want to create the next histogram density plot with ggplot2 .我想用ggplot2创建下一个直方图密度图。 In the "normal" way (base packages) is really easy:以“正常”方式(基本包)非常简单:

set.seed(46)
vector <- rnorm(500)  
breaks <- quantile(vector,seq(0,1,by=0.1))
labels = 1:(length(breaks)-1)
den = density(vector)
hist(df$vector,
     breaks=breaks,
     col=rainbow(length(breaks)),
     probability=TRUE)
lines(den)

在此处输入图片说明

With ggplot I have reached this so far:到目前为止,我已经使用 ggplot 达到了这一点:

seg <- cut(vector,breaks,
           labels=labels,
           include.lowest = TRUE, right = TRUE)
df = data.frame(vector=vector,seg=seg)

ggplot(df) + 
     geom_histogram(breaks=breaks,
                    aes(x=vector,
                        y=..density..,
                        fill=seg)) + 
     geom_density(aes(x=vector,
                      y=..density..))

But the "y" scale has the wrong dimension.但是“y”比例尺有错误的维度。 I have noted that the next run gets the "y" scale right.我已经注意到下一次运行正确地获得了“y”比例。

 ggplot(df) + 
     geom_histogram(breaks=breaks,
                    aes(x=vector,
                    y=..density..,
                    fill=seg)) + 
     geom_density(aes(x=vector,
                      y=..density..))

I just do not understand it.我就是不明白。 y=..density.. is there, that should be the height. y=..density..在那里,那应该是高度。 So why on earth my scale gets modified when I try to fill it?那么究竟为什么当我尝试填充时我的比例会被修改呢?

I do need the colours.我确实需要颜色。 I just want a histogram where the breaks and the colours of each block are directionally set according to the default ggplot fill colours.我只想要一个直方图,其中每个块的中断和颜色是根据默认的 ggplot 填充颜色定向设置的。

Manually, I added colors to your percentile bars.我手动为您的百分位条添加了颜色。 See if this works for you.看看这是否适合你。

library(ggplot2)

ggplot(df, aes(x=vector)) +   
   geom_histogram(breaks=breaks,aes(y=..density..),colour="black",fill=c("red","orange","yellow","lightgreen","green","darkgreen","blue","darkblue","purple","pink")) + 
   geom_density(aes(y=..density..)) +
   scale_x_continuous(breaks=c(-3,-2,-1,0,1,2,3)) +
   ylab("Density") + xlab("df$vector") + ggtitle("Histogram of df$vector") +
   theme_bw() + theme(plot.title=element_text(size=20),
                      axis.title.y=element_text(size = 16, vjust=+0.2),
                      axis.title.x=element_text(size = 16, vjust=-0.2),
                      axis.text.y=element_text(size = 14),
                      axis.text.x=element_text(size = 14),
                      panel.grid.major = element_blank(),
                      panel.grid.minor = element_blank())

在此处输入图片说明

fill=seg results in grouping. fill=seg导致分组。 You are actually getting a different histogram for each value of seg .对于seg每个值,您实际上得到了不同的直方图。 If you don't need the colours, you could use this:如果你不需要颜色,你可以使用这个:

ggplot(df) + 
  geom_histogram(breaks=breaks,aes(x=vector,y=..density..), position="identity") + 
  geom_density(aes(x=vector,y=..density..))

在此处输入图片说明

If you need the colours, it might be easiest to calculate the density values outside of ggplot2.如果您需要颜色,在 ggplot2 之外计算密度值可能是最简单的。

Or an option with ggpubr或者ggpubr一个选项

library(ggpubr)
gghistogram(df, x = "vector", add = "mean", rug = TRUE, fill = "seg",
   palette = c("#00AFBB", "#E7B800", "#E5A800", "#00BFAB", "#01ADFA", 
   "#00FABA", "#00BEAF", "#01AEBF", "#00EABA", "#00EABB"), add_density = TRUE)

The confusion regarding interpreting the y-axis might be due to density is plotted rather than count.关于解释y-axis的混淆可能是由于密度绘制而不是计数。 So, the values on the y-axis are proportions of the total sample, where the sum of the bars is equal to 1 .因此, y-axis的值是总样本的比例,其中条形总和等于1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM