简体   繁体   English

ggplot2:yearmon 比例和 geom_bar

[英]ggplot2: yearmon scale and geom_bar

More than a solution I'd like to understand the reason why something which should be quite easy, it's actually not.不仅仅是一个解决方案,我想了解为什么应该很容易的事情,实际上并非如此。

[I am borrowing part of the code from a different post which touched on the issue but it ended up with a solution I didn't like] [我从另一篇触及该问题的帖子中借用了部分代码,但最终得到了一个我不喜欢的解决方案]

library(ggplot2)
library(xts)
library(dplyr)
library(scales)

csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"

tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.yearmon(tmp$dt)
tmp$status <- as.factor(tmp$status)

### Not good. Why?
ggplot(tmp, aes(x = yearmon, fill = status)) + 
  geom_bar() + 
  scale_x_yearmon()

### Almost good but long-winded and ticks not great
chartData <- tmp %>%
  group_by(yearmon, status) %>%
  summarise(count = n()) %>%
  as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) + 
  geom_col() + 
  scale_x_yearmon()

The first plot is all wrong;第一个情节全错了; the second is almost perfect (ticks on the X axis are not great but I can live with that).第二个几乎是完美的(X 轴上的刻度不是很好,但我可以接受)。 Isn't geom_bar() supposed to perform the count job I have to manually perform in the second chart? geom_bar()应该执行我必须在第二个图表中手动执行的计数工作吗?

FIRST CHART第一张图糟糕的情节

SECOND CHART第二张图更好的情节

My question is: why is the first chart so poor?我的问题是:为什么第一个图表如此糟糕? There is a warning which is meant to suggest something ("position_stack requires non-overlapping x intervals") but I really fail to understand it.有一个警告是为了提出一些建议(“position_stack 需要不重叠的 x 间隔”),但我真的无法理解。 Thanks.谢谢。

MY PERSONAL ANSWER我的个人回答

This is what I learned (thanks so much to all of you!):这就是我学到的(非常感谢你们所有人!):

  • Even if there is a scale_#_yearmon or scale_#_date , unfortunately ggplot treats those object types as continuous numbers.即使有scale_#_yearmonscale_#_date ,不幸的是ggplot将这些对象类型视为连续数字。 That makes geom_bar unusable.这使得geom_bar无法使用。
  • geom_histogram might do the trick. geom_histogram可能会geom_histogram But you lose control on relevant parts of the aestethics.但是你失去了对美学相关部分的控制。
  • bottom line: you need to group/sum before you chart底线:您需要在绘制图表之前进行分组/求和
  • Not sure (if you plan to use ggplot2) xts or lubridate are really that useful for what I was trying to achieve.不确定(如果您打算使用 ggplot2) xtslubridate对我想要实现的目标真的那么有用。 I suspect for any continuous case - date-wise - they will be perfect.我怀疑对于任何连续的案例 - 日期方面 - 它们将是完美的。

All in, I ended with this which does perfectly what I am after (notice how there is no need for xts or lubridate ):总而言之,我以这个完美地结束了我的追求(请注意如何不需要xtslubridate ):

library(ggplot2)
library(dplyr)
library(scales)

csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"

tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.Date(format(tmp$dt, "%Y-%m-01"))
tmp$status <- as.factor(tmp$status)

### GOOD
chartData <- tmp %>%
  group_by(yearmon, status) %>%
  summarise(count = n()) %>%
  as.data.frame()

ggplot(chartData, aes(x = yearmon, y = count, fill = status)) + 
  geom_col() + 
  scale_x_date(labels = date_format("%h-%y"),
               breaks = seq(from = min(chartData$yearmon), 
                            to = max(chartData$yearmon), by = "month"))

FINAL OUTPUT最终输出最后的情节

The reason why the first plot is screwed is basically ggplot2 does not exactly what the yearmon is.第一个情节被搞砸的原因基本上是ggplot2并不完全是yearmon As you see here it is just a num internally with labels.正如你在这里看到的,它只是一个带有标签的内部num

> as.numeric(tmp$yearmon)
[1] 2015.917 2015.917 2015.917 2015.833 2015.750 2015.917 2016.417 2016.333 2016.167 2015.917

So when you plot without the previous aggregation, the bar is spread out.因此,当您在没有先前聚合的情况下绘图时,条形会展开。 You need to assign appropriate binwidth using geom_histogram() like this:您需要使用geom_histogram()分配适当的binwidth如下所示:

ggplot(tmp, aes(x = yearmon, fill = status)) + 
  geom_histogram(binwidth = 1/12) + 
  scale_x_yearmon()

1/12 corresponds with 12 months in each year. 1/12对应于每年的 12 个月。

For a plot after aggregation, as @ed_sans suggest, I also prefer lubridate as I know better on how to change ticks and modify axis labels.对于聚合后的绘图,正如@ed_sans 所建议的,我也更喜欢lubridate因为我更了解如何更改刻度和修改轴标签。

chartData <- tmp %>%
  mutate(ym = floor_date(dt,"month")) %>%
  group_by(ym, status) %>%
  summarise(count = n()) %>%
  as.data.frame()

ggplot(chartData, aes(x = ym, y = count, fill = status)) + 
  geom_col() + 
  scale_x_date(labels = date_format("%Y-%m"),
               breaks = as.Date("2015-09-01") + 
                 months(seq(0, 10, by = 2)))

您也可以aes(x=factor(yearmon), ...)作为快捷方式修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM