简体   繁体   English

让 ggplot2 直方图在 y 轴上显示分类百分比

[英]Let ggplot2 histogram show classwise percentages on y axis

library(ggplot2)
data = diamonds[, c('carat', 'color')]
data = data[data$color %in% c('D', 'E'), ]

I would like to compare the histogram of carat across color D and E, and use the classwise percentage on the y-axis.我想比较颜色 D 和 E 的克拉直方图,并在 y 轴上使用分类百分比。 The solutions I have tried are as follows:我尝试过的解决方案如下:

Solution 1:解决方案1:

ggplot(data=data, aes(carat, fill=color)) +  geom_bar(aes(y=..density..), position='dodge', binwidth = 0.5) + ylab("Percentage") +xlab("Carat")

在此处输入图像描述

This is not quite right since the y-axis shows the height of the estimated density.这不太正确,因为 y 轴显示了估计密度的高度。

Solution 2:解决方案2:

 ggplot(data=data, aes(carat, fill=color)) +  geom_histogram(aes(y=(..count..)/sum(..count..)), position='dodge', binwidth = 0.5) + ylab("Percentage") +xlab("Carat")

在此处输入图像描述

This is also not I want, because the denominator used to calculate the ratio on the y-axis are the total count of D + E.这也不是我想要的,因为用于计算 y 轴上比率的分母是 D + E 的总数。

Is there a way to display the classwise percentages with ggplot2's stacked histogram?有没有办法用 ggplot2 的堆叠直方图显示分类百分比? That is, instead of showing (# of obs in bin)/count(D+E) on y axis, I would like it to show (# of obs in bin)/count(D) and (# of obs in bin)/count(E) respectively for two color classes.也就是说,不是在 y 轴上显示 (# of obs in bin)/count(D+E),我希望它显示 (# of obs in bin)/count(D) 和 (# of obs in bin) /count(E) 分别用于两个颜色类别。 Thanks.谢谢。

Calculating from stats从统计数据计算

You can scale them by group by using the special stat variables group and count , using group to select subsets of count .您可以使用特殊的统计变量groupcount按组对它们进行缩放,使用group选择count的子集。

If you have ggplot 3.3.0 or newer, you can use the after_stat function to access these special variables:如果你有 ggplot 3.3.0 或更新版本,你可以使用after_stat函数来访问这些特殊变量:

ggplot(data, aes(carat, fill=color)) +
  geom_histogram(
    aes(y=after_stat(c(
      count[group==1]/sum(count[group==1]),
      count[group==2]/sum(count[group==2])
    )*100)),
    position='dodge',
    binwidth=0.5
  ) +
  ylab("Percentage") + xlab("Carat")

克拉与百分比的 ggplot 图,有两组条,每组显示给定颜色的百分比,根据需要

Using older versions of ggplot使用旧版本的 ggplot

In earlier versions, this is more cumbersome - if you have at least 3.0 you can wrap stat() function around each individual variable reference, in pre-3.0 versions you have to surround them with two dots instead:在早期版本中,这更加麻烦 - 如果您至少有 3.0,您可以将stat()函数包装在每个单独的变量引用中,在 3.0 之前的版本中,您必须用两个点将它们包围:

aes(y=c(
  ..count..[..group..==1]/sum(..count..[..group..==1]),
  ..count..[..group..==2]/sum(..count..[..group..==2])
)*100),

Yeah but what are all these stats?是的,但所有这些统计数据什么?

For more details on where these variables come from, summary stats will be documented alongside the stat function being used - for example geom_histogram 's default stat_bin() has this Computed variables section:有关这些变量来自何处的更多详细信息,汇总统计信息将与正在使用的 stat 函数一起记录 - 例如geom_histogram的默认stat_bin()具有此Computed variables部分:

Computed variables:计算变量:

  • count number of points in bin计算bin 中的点数
  • density density of points in bin, scaled to integrate to 1密度bin 中点的密度,缩放到积分为 1
  • ncount count, scaled to maximum of 1 ncount计数,最大为 1
  • ndensity density, scaled to maximum of 1 ndensity密度,最大为 1
  • width widths of bins垃圾箱的宽度

Beyond that, you can use ggplot_build() to inspect all the stats generated for any given plot:除此之外,您可以使用 ggplot_build()检查为任何给定绘图生成的所有统计信息:

> p = ggplot(data, [...etc...])
> ggplot_build(p)
$data
$data[[1]]
        fill           y count      x  xmin xmax      density       ncount
1  #440154FF  1.50553506   102 -0.125 -0.25 0.00 0.0301107011 0.0224323730
2  #440154FF 67.11439114  4547  0.375  0.25 
[...snip...]
       ndensity flipped_aes PANEL group ymin        ymax colour size linetype
1  0.0224323730       FALSE     1     1    0  1.50553506     NA  0.5        1
2  1.0000000000       FALSE     1     1    0 67.11439114     NA  0.5        1
[...snip...]

It seems that binning the data outside of ggplot2 is the way to go.似乎将 ggplot2 之外的数据分箱是可行的方法。 But I would still be interested to see if there is a way to do it with ggplot2.但我仍然有兴趣看看是否有办法用 ggplot2 来做到这一点。

library(dplyr)
breaks = seq(0,4,0.5)

data$carat_cut = cut(data$carat, breaks = breaks)

data_cut = data %>%
  group_by(color, carat_cut) %>%
  summarise (n = n()) %>%
  mutate(freq = n / sum(n))

ggplot(data=data_cut, aes(x = carat_cut, y=freq*100, fill=color)) + geom_bar(stat="identity",position="dodge") + scale_x_discrete(labels = breaks) +  ylab("Percentage") +xlab("Carat")

在此处输入图像描述

Fortunately, in my case, Rorschach's answer worked perfectly.幸运的是,就我而言,罗夏的答案非常有效。 I was here looking to avoid the solution proposed by Megan Halbrook, which is the one I was using until I realized it is not a correct solution.我来这里是为了避免使用 Megan Halbrook 提出的解决方案,在我意识到这不是一个正确的解决方案之前,我一直在使用这个解决方案。

Adding a density line to the histogram automatically change the y axis to frequency density, not to percentage.向直方图添加密度线会自动将 y 轴更改为频率密度,而不是百分比。 The values of frequency density would be equivalent to percentages only if binwidth = 1.只有当 binwidth = 1 时,频率密度的值才等于百分比。

Googling: To draw a histogram, first find the class width of each category.谷歌搜索:要绘制直方图,首先找到每个类别的类宽度。 The area of the bar represents the frequency, so to find the height of the bar, divide frequency by the class width.条的面积代表频率,因此要找到条的高度,请将频率除以类宽度。 This is called frequency density.这称为频率密度。 https://www.bbc.co.uk/bitesize/guides/zc7sb82/revision/9 https://www.bbc.co.uk/bitesize/guides/zc7sb82/revision/9

Below an example, where the left panel shows percentage and the right panel shows density for the y axis.下面是一个示例,其中左侧面板显示百分比,右侧面板显示 y 轴的密度。

library(ggplot2)
library(gridExtra)

TABLE <- data.frame(vari = c(0,1,1,2,3,3,3,4,4,4,5,5,6,7,7,8))

## selected binwidth
bw <- 2

## plot using count
plot_count <- ggplot(TABLE, aes(x = vari)) + 
   geom_histogram(aes(y = ..count../sum(..count..)*100), binwidth = bw, col =1) 
## plot using density
plot_density <- ggplot(TABLE, aes(x = vari)) + 
   geom_histogram(aes(y = ..density..), binwidth = bw, col = 1)

## visualize together
grid.arrange(ncol = 2, grobs = list(plot_count,plot_density))

在此处输入图像描述

## visualize the values
data_count <- ggplot_build(plot_count)
data_density <- ggplot_build(plot_density)

## using ..count../sum(..count..) the values of the y axis are the same as 
## density * bindwidth * 100. This is because density shows the "frequency density".
data_count$data[[1]]$y == data_count$data[[1]]$density*bw * 100
## using ..density.. the values of the y axis are the "frequency densities".
data_density$data[[1]]$y == data_density$data[[1]]$density


## manually calculated percentage for each range of the histogram. Note 
## geom_histogram use right-closed intervals.
min_range_of_intervals <- data_count$data[[1]]$xmin

for(i in min_range_of_intervals)
  cat(paste("Values >",i,"and <=",i+bw,"involve a percent of",
            sum(TABLE$vari>i & TABLE$vari<=(i+bw))/nrow(TABLE)*100),"\n")

# Values > -1 and <= 1 involve a percent of 18.75 
# Values > 1 and <= 3 involve a percent of 25 
# Values > 3 and <= 5 involve a percent of 31.25 
# Values > 5 and <= 7 involve a percent of 18.75 
# Values > 7 and <= 9 involve a percent of 6.25 

When I tried Rorschach's answer it wasn't working for me for reasons that weren't readily apparent but I wanted to comment to say if you were open to adding density lines to a histogram once you do that it will automatically change the y axis to percent.当我尝试 Rorschach 的答案时,由于不太明显的原因,它对我不起作用,但我想评论说,如果你愿意在直方图中添加密度线,一旦你这样做,它会自动将 y 轴更改为百分。

For example I have a count of "doses" by a binary outcome (0,1)例如,我有一个二进制结果(0,1)的“剂量”计数

this code produces the following graph:此代码生成以下图表:

ggplot(data, aes(x=siadoses, fill=recallbin, color=recallbin)) +
  geom_histogram(binwidth=1, alpha=.5, position='identity') 

直方图 1

But when I include a density plot to my ggplot code and add y=..density.. I get this plot with percent on the Y但是,当我在我的 ggplot 代码中包含一个密度图并添加 y=..density.. 时,我得到了这个带有 Y 百分比的图

ggplot(data, aes(x=siadoses, fill=recallbin, color=recallbin)) +
  geom_histogram(aes(y=..density..), binwidth=1, alpha=.5, position='identity') +
  geom_density(alpha=.2)

直方图 2

kind of a work around to your original question but thought I would share.一种解决您最初问题的方法,但我想我会分享。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM