简体   繁体   English

频率计数直方图在 y 轴上仅显示 integer 值?

[英]Frequency count histogram displaying only integer values on the y-axis?

I'd much appreciate anyone's help to resolve this question please.我非常感谢任何人帮助解决这个问题。 It seems like it should be so simple, but after many hours experimenting, I've had to stop in and ask for help.看起来应该很简单,但经过几个小时的试验,我不得不停下来寻求帮助。 Thank you very much in advance!非常感谢您!

Summary of question:问题总结:

How can one ensure in ggplot2 the y-axis of a histogram is labelled using only integers (frequency count values) and not decimals?如何确保在 ggplot2 中,直方图的 y 轴仅使用整数(频率计数值)而不是小数进行标记?

The functions, arguments and datatype changes tried so far include:迄今为止尝试的函数 arguments 和数据类型更改包括:

  • geom_histogram() , geom_bar() and geom(col) - in each case, including, or not, the argument stat = "identity" where relevant. geom_histogram()geom_bar()geom(col) - 在每种情况下,包括或不包括相关的参数stat = "identity"
  • adding + scale_y_discrete() , with or without + scale_x_discrete()添加+ scale_y_discrete() ,有或没有+ scale_x_discrete()
  • converting the underlying count data to a factor and/or the bin data to a factor将基础计数数据转换为因子和/或将 bin 数据转换为因子

Ideally, the solution would be using baseR or ggplot2, instead of additional external dependencies eg by using the function pretty_breaks() func in the scales package, or similar.理想情况下,解决方案将使用 baseR 或 ggplot2,而不是额外的外部依赖项,例如使用 function pretty_breaks()函数在 ZEFE90A8E604A7C840E88D03A6ZF6 或类似的scales中。

Sample data:样本数据:

sample <- data.frame(binMidPts = c(4500,5500,6500,7500), counts = c(8,0,9,3))

The x-axis consists of bins of a continuous variable, and the y-axis is intended to show the count of observations in those bins. x 轴由连续变量的 bin 组成,y 轴旨在显示这些 bin 中的观察计数。 For example, Bin 1 covers the x-axis range [4000 <= x < 5000], has a mid-point 4500, with 8 data points observed in that bin / range.例如,Bin 1 覆盖 x 轴范围 [4000 <= x < 5000],具有中点 4500,在该 bin / 范围内观察到 8 个数据点。

Code that almost works:几乎可以工作的代码:

The following code generates a graph similar to the one I'm seeking, however the y-axis is labelled with decimal values on the breaks (which aren't valid as the data are integer count values).下面的代码生成一个类似于我正在寻找的图形,但是 y 轴在中断处标有十进制值(这是无效的,因为数据是 integer 计数值)。

ggplot(data = sample, aes (x = binMidPts, y = counts)) + geom_col()

Graph produced by this code is:此代码生成的图形是: 带有“不正确”连续 y 轴的简单 geom_col 图

I realise I could hard-code the breaks / labels onto a scale_y_continuous() axis but (a) I'd prefer a flexible solution to apply to many differently sized datasets where the scale isn't know in advance, and (b) I expect there must be a simpler way to generate a basic histogram.我意识到我可以将中断/标签硬编码到scale_y_continuous()轴上,但是(a)我更喜欢灵活的解决方案来应用于许多不同大小的数据集,其中比例事先不知道,并且(b)我期望必须有一种更简单的方法来生成基本的直方图。

References参考

I've consulted many Stack Overflow questions, the ggplot2 manual ( https://ggplot2.tidyverse.org/reference/scale_discrete.html ), the sthda.com examples and various blogs. I've consulted many Stack Overflow questions, the ggplot2 manual ( https://ggplot2.tidyverse.org/reference/scale_discrete.html ), the sthda.com examples and various blogs. These tend to address related problems, eg using scale_y_continuous , or where count data is not available in the underlying dataset and thus rely on stat_bin() for a transformation.这些倾向于解决相关问题,例如使用scale_y_continuous ,或者在底层数据集中没有可用的计数数据,因此依赖 stat_bin() 进行转换。

Any help would be much appreciated.任何帮助将非常感激。 Thank you.谢谢你。

// Update 1 - Extending scale to zero // 更新 1 - 将比例扩展到零

Future readers of this thread may find it helpful to know that the range of break values formed by base::pretty() does not necessarily extend to zero.该线程的未来读者可能会发现了解由 base::pretty() 形成的中断值范围不一定会扩展到零是有帮助的。 Thus, the axis scale may omit values between zero and the lower range of the breaks, as shown here:因此,轴刻度可能会省略介于零和中断下限之间的值,如下所示: y 轴中断省略低于 pretty() 的下限

To resolve this, I included '0' in the range() parameter, ie:为了解决这个问题,我在 range() 参数中包含了“0”,即:

ggplot(data = sample, aes (x = binMidPts, y = counts)) + geom_col() +
    scale_y_continuous(breaks=round(pretty(range(0,sample$counts))))

which gives the desired full scale on the y-axis, thus:它在 y 轴上给出了所需的满量程,因此:

y 轴刻度延伸到零

How about:怎么样:


ggplot(data = sample, aes (x = binMidPts, y = counts)) + geom_col() +
    scale_y_continuous( breaks=round(pretty( range(sample$counts) )) )

在此处输入图像描述

This answer suggests pretty_breaks from the scales package.这个答案pretty_breaks尺度上建议了 pretty_breaks。 The manual page of pretty_breaks mentions pretty from base . pretty_breaks的手册页从base中提到了pretty And from there you just have to round it to the nearest integer.从那里你只需要将它四舍五入到最近的 integer。

Or you can calculate the breaks with some rules customized to the dataset you are working like this或者您可以使用针对您正在使用的数据集自定义的一些规则来计算中断,如下所示

library(ggplot2)

breaks_min <- 0
breaks_max <- max(sample[["counts"]])
# Assume 5 breaks is perferable
breaks_bin <- round((breaks_max - breaks_min) / 5)
custom_breaks <- seq(breaks_min, breaks_max, breaks_bin)

ggplot(data = sample, aes (x = binMidPts, y = counts)) + 
  geom_col() +
  scale_y_continuous(breaks = custom_breaks, expand = c(0, 0))

Created on 2021-04-28 by the reprex package (v2.0.0)代表 package (v2.0.0) 于 2021 年 4 月 28 日创建

The default y-axis breaks is calculated with scales::extended_breaks() .默认的 y 轴中断是使用scales::extended_breaks()计算的。 This function factory has a ... argument that passes on arguments to labeling::extended , which has a Q argument for what it considers 'nice numbers'.这个 function 工厂有一个...参数,它将 arguments 传递给labeling::extended ,它有一个Q参数来表示它认为“好数字”。 If you omit the 2.5 from the default, you should get integer breaks when the range is 3 or larger.如果您从默认值中省略2.5 ,则当范围为 3 或更大时,您应该得到 integer 中断。

library(ggplot2)
library(scales)

sample <- data.frame(binMidPts = c(4500,5500,6500,7500), counts = c(8,0,9,3))

ggplot(data = sample, aes (x = binMidPts, y = counts)) + 
  geom_col() +
  scale_y_continuous(
    breaks = extended_breaks(Q = c(1, 5, 2, 4, 3))
  )

Created on 2021-04-28 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2021 年 4 月 28 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM