[英]ggplot2: yearmon scale and geom_bar
More than a solution I'd like to understand the reason why something which should be quite easy, it's actually not.不仅仅是一个解决方案,我想了解为什么应该很容易的事情,实际上并非如此。
[I am borrowing part of the code from a different post which touched on the issue but it ended up with a solution I didn't like] [我从另一篇触及该问题的帖子中借用了部分代码,但最终得到了一个我不喜欢的解决方案]
library(ggplot2)
library(xts)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.yearmon(tmp$dt)
tmp$status <- as.factor(tmp$status)
### Not good. Why?
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_bar() +
scale_x_yearmon()
### Almost good but long-winded and ticks not great
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_yearmon()
The first plot is all wrong;第一个情节全错了; the second is almost perfect (ticks on the X axis are not great but I can live with that).第二个几乎是完美的(X 轴上的刻度不是很好,但我可以接受)。 Isn't geom_bar()
supposed to perform the count job I have to manually perform in the second chart? geom_bar()
应该执行我必须在第二个图表中手动执行的计数工作吗?
My question is: why is the first chart so poor?我的问题是:为什么第一个图表如此糟糕? There is a warning which is meant to suggest something ("position_stack requires non-overlapping x intervals") but I really fail to understand it.有一个警告是为了提出一些建议(“position_stack 需要不重叠的 x 间隔”),但我真的无法理解。 Thanks.谢谢。
MY PERSONAL ANSWER我的个人回答
This is what I learned (thanks so much to all of you!):这就是我学到的(非常感谢你们所有人!):
scale_#_yearmon
or scale_#_date
, unfortunately ggplot treats those object types as continuous numbers.即使有scale_#_yearmon
或scale_#_date
,不幸的是ggplot将这些对象类型视为连续数字。 That makes geom_bar
unusable.这使得geom_bar
无法使用。geom_histogram
might do the trick. geom_histogram
可能会geom_histogram
。 But you lose control on relevant parts of the aestethics.但是你失去了对美学相关部分的控制。All in, I ended with this which does perfectly what I am after (notice how there is no need for xts or lubridate ):总而言之,我以这个完美地结束了我的追求(请注意如何不需要xts或lubridate ):
library(ggplot2)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.Date(format(tmp$dt, "%Y-%m-01"))
tmp$status <- as.factor(tmp$status)
### GOOD
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%h-%y"),
breaks = seq(from = min(chartData$yearmon),
to = max(chartData$yearmon), by = "month"))
The reason why the first plot is screwed is basically ggplot2
does not exactly what the yearmon
is.第一个情节被搞砸的原因基本上是ggplot2
并不完全是yearmon
。 As you see here it is just a num
internally with labels.正如你在这里看到的,它只是一个带有标签的内部num
。
> as.numeric(tmp$yearmon)
[1] 2015.917 2015.917 2015.917 2015.833 2015.750 2015.917 2016.417 2016.333 2016.167 2015.917
So when you plot without the previous aggregation, the bar is spread out.因此,当您在没有先前聚合的情况下绘图时,条形会展开。 You need to assign appropriate binwidth
using geom_histogram()
like this:您需要使用geom_histogram()
分配适当的binwidth
如下所示:
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_histogram(binwidth = 1/12) +
scale_x_yearmon()
1/12
corresponds with 12 months in each year. 1/12
对应于每年的 12 个月。
For a plot after aggregation, as @ed_sans suggest, I also prefer lubridate
as I know better on how to change ticks and modify axis labels.对于聚合后的绘图,正如@ed_sans 所建议的,我也更喜欢lubridate
因为我更了解如何更改刻度和修改轴标签。
chartData <- tmp %>%
mutate(ym = floor_date(dt,"month")) %>%
group_by(ym, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = ym, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%Y-%m"),
breaks = as.Date("2015-09-01") +
months(seq(0, 10, by = 2)))
您也可以aes(x=factor(yearmon), ...)
作为快捷方式修复。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.