简体   繁体   English

在ggplot2中填写热图(24h到7天)

[英]fill a heat map (24h by 7days) in ggplot2

I have bike data that looks like this - the dimensions of the data frame are large. 我有看起来像这样的自行车数据-数据框的尺寸很大。

> dim(All_2014)
[1] 994367     10
> head(All_2014)
  X bikeid end.station.id start.station.id diff.time            stoptime           starttime
1 1  16379            285              356    338387 2014-01-02 15:22:28 2014-01-06 13:22:15
2 2  16379            361              146     47631 2014-01-09 22:45:34 2014-01-10 11:59:25
3 3  16379            268              327      5089 2014-01-10 12:35:22 2014-01-10 14:00:11
4 4  16379            398              324    715924 2014-01-22 14:34:55 2014-01-30 21:26:59
5 5  15611            536              445    716031 2014-01-02 15:30:44 2014-01-10 22:24:35
6 6  15611            348              433     68544 2014-01-12 14:03:01 2014-01-13 09:05:25
              midtime Hour      Day
1 2014-01-04 14:22:21   14 Saturday
2 2014-01-10 05:22:29    5   Friday
3 2014-01-10 13:17:46   13   Friday
4 2014-01-26 18:00:57   18   Sunday
5 2014-01-06 18:57:39   18   Monday
6 2014-01-12 23:34:13   23   Sunday

My aim is to create a heat map using ggplot2 (or another package if it is better suited) that looks like this one, where day of the week is on the y-axis and hour is on the x-axis (the hour does not have to be in AM/PM, it can remain as is on the 24-hour scale.: 我的目标是使用ggplot2 (或更适合的其他软件包)创建一个ggplot2的热图,其中星期几在y轴上,小时在x轴上(小时不必须使用AM / PM,它可以保持24小时制不变。: 在此处输入图片说明

The fill of the boxes is a percentage that represents the amount of rides taken within a given hour-interval/the total rides on that day of the week. 方框中的填充是一个百分比,表示在给定的小时间隔内进行的行驶次数/一周中该天的总行驶次数。 I have managed to get this far with the data, but would like to know the easiest way to find percentages and then, how to create a heat map with them. 我已经设法使数据更深入了,但是我想知道最简单的方法来找到百分比,然后找到如何用它们创建热图。

Using dplyr to do the calculations, and ggplot2 to do the chart: 使用dplyr进行计算,并使用ggplot2进行图表:

library(dplyr)
library(ggplot2)


## First siimulate some data
rider_num <- 1:10000
days <- factor(c("Sun", "Mon", "Tues", "Wed", "Thur", "Fri", "Sat"), 
               levels = rev(c("Sun", "Mon", "Tues", "Wed", "Thur", "Fri", "Sat")), 
               ordered = TRUE)

day <- sample(days, 10000, TRUE, 
              c(0.3, 0.5, 0.8, 0.8, 0.6, 0.5, 0.2))
hour <- round(rbeta(10000, 1, 2, 6) * 23)
df <- data.frame(rider_num, hour, day)

## Use dplyr functions to summarize on days and hours to get the 
## percentage of riders per hour each day:
df2 <- df %>% 
  group_by(day, hour) %>% 
  summarise(n=n()) %>% 
  mutate(percent_of_riders=n/sum(n)*100)

## Plot using ggplot and geom_tile, tweaking colours and theme elements
## to your liking:
ggplot(df2, aes(hour, day)) + 
  geom_tile(aes(fill = percent_of_riders), colour = "white") + 
  scale_fill_distiller(palette = "YlGnBu", direction = 1) +
  scale_x_discrete(breaks = 0:23, labels = 0:23) + 
  theme_minimal() +
  theme(legend.position = "bottom", legend.key.width = unit(2, "cm"),
        panel.grid = element_blank()) + 
  coord_equal()

热图

Using @andyteucher's df2 , here's a lattice approach: 使用@andyteucher的df2 ,这是lattice方法:

library(lattice)
library(RColorBrewer)
levelplot(percent_of_riders~hour+day, df2, 
          aspect='iso', xlab='', ylab='', border='white',
          col.regions=colorRampPalette(brewer.pal(9, 'YlGnBu')),
          at=seq(0, 12, length=100), # specify breaks for the colour ramp
          scales=list(alternating=FALSE, tck=1:0, x=list(at=0:23)))

在此处输入图片说明

One simple way to replace missing data (eg Sunday at midnight) with zero is to pass an xtabs object to levelplot instead: 将丢失的数据(例如,午夜的周日)替换为零的一种简单方法是将xtabs对象传递给levelplot

levelplot(xtabs(percent_of_riders ~ hour+day, df2), aspect='iso', xlab='', ylab='',
          col.regions=colorRampPalette(brewer.pal(9, 'YlGnBu')),
          at=seq(0, 12, length=100),
          scales=list(alternating=FALSE, tck=1:0),
          border='white')

在此处输入图片说明

You can also use d3heatmap for interactivity: 您也可以使用d3heatmap进行交互:

library(d3heatmap)
xt <- xtabs(percent_of_riders~day+hour, df2)
d3heatmap(xt[7:1, ], colors='YlGnBu', dendrogram = "none")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM