简体   繁体   中英

How to adjust x-axis in ggplot's density plot?

I am trying to get an overview about the hourly frequency of my data wrt the weekday. Therefore I condensed the different dates into one single day so that only the time differs and added a column that represents the day of the week as an ordered factor.

The following is an extract of my data:

my.log <- structure(list(Prorated = structure(c(1339535400, 1339536540, 1339524540, 1339480320, 1339537920, 1339529580, 1339500780, 1339532820, 1339522020, 1339522680, 1339465560, 1339529940, 1339472880, 1339508520, 1339519620, 1339536000, 1339526580, 1339514940, 1339518060, 1339512420, 1339513080, 1339500120, 1339543620, 1339485660, 1339496280, 1339526520, 1339514820, 1339531800, 1339531860, 1339501320), class = c("POSIXct", "POSIXt"), tzone = "%Y-%m-%d %H:%M:%S"), Wday = structure(c(1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 3, 4, 1, 1, 3, 3, 4, 4, 5, 5, 5, 1, 2, 2, 2), .Label = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), class = c("ordered", "factor"))), .Names = c("Prorated", "Wday"), row.names = c(NA, 30), class = "data.frame")

range(my.log$Prorated)
# here (n = 30):
# [1] "2012-06-12 01:46:00" "2012-06-12 23:27:00"
# w/ full data set (n = approx. 75000):
# [1] "2012-06-12 00:00:00" "2012-06-12 23:59:00"

When I now try to plot a density plot with the following code...

library("ggplot2")
library("scales")
p <- ggplot(my.log) + theme_bw() +
  geom_density(aes(Prorated, colour=Wday)) +
  scale_color_brewer("weekday", palette="Dark2") +
  scale_x_datetime("", breaks=date_breaks("4 hours"),
    labels=date_format("%H:00")) +
  opts(title="Distribution (KDE)")
print(p)

... the x-axis with both data sets does not start at 00:00 but at 02:00am and as a result the whole density plot is moved into the next day. (I wanted to post an image here but since I am new to SO I am not allowed to do so. You can find it at ImageShack )

Thus, my question: Is there an option to tell qqplot() that it should start its density plot at 00:00?

I checked SO for related questions (or answeres respectively) but could not find any. The only options that come into my mind are either xlim() or scale_x_continuous(limits=...) . But as far as I understand those, both are not the right ones here.

The former would drop data points (or not since all data of the input data.frame is already in the correct range) while the latter would just shift the viewpoint and as a result would cut off the graph at 23:59 without adding these (now hidden) datapoints at the beginning... So, when I use

scale_x_datetime("", breaks=date_breaks("4 hours"), labels=date_format("%H:00"),
  limits=c(as.POSIXct("2012-06-12 00:00:00"), as.POSIXct("2012-06-12 23:59:00"))

in the code above, the graph looks wrong/ does not show all data.

It's a timezone issue. See this related question: What is the appropriate timezone argument syntax for scale_datetime() in ggplot 0.9.0

You can work around it by changing the labels argument to function(x) format(x, "%H:00", tz="UTC") (or possibly some other appropriate timezone). I had to change your example data since it had a mal-formed tzone attribute for the POSIXt column of the data frame.

ggplot(my.log) + theme_bw() +
  geom_density(aes(Prorated, colour=Wday)) +
  scale_color_brewer("weekday", palette="Dark2") +
  scale_x_datetime("", breaks=date_breaks("4 hours"),
    labels=function(x) format(x,"%H:00",tz="UTC")) +
  opts(title="Distribution (KDE)")

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM