简体   繁体   English

时间序列直方图

[英]Time-series histogram

Is it possible to create a time-series histogram like the one described in this presentation (slides 36-39) using either R or D3.js? 是否可以使用R或D3.js创建如演示文稿(幻灯片36-39)中描述的时间序列直方图? Or is there a better way to show bucketed data as a time series? 或者有更好的方法将分段数据显示为时间序列吗?

Edit: Here is some pre-bucketed sample data . 编辑:这是一些预先打包的样本数据 Ideally, D3 or R would do the bucketing by itself. 理想情况下,D3或R会单独进行分组。 And yes, if it wasn't clear, I understand that I could write this myself. 是的,如果不清楚,我明白我自己可以写这个。 I'm just wondering if there's already a package that does this and I just haven't come across it yet. 我只是想知道是否已经有一个包来做这个,我还没有碰到它。 Thanks! 谢谢!

Here's a version in D3, modeled after @bdemarest's answer using ggplot2: 这是D3中的一个版本,模仿@bdemarest使用ggplot2的答案:

D3热图

This version uses tiled rect elements . 此版本使用平铺的rect元素 If you have a large dataset, you might get better performance from a pixel-based heatmap . 如果您有一个大型数据集,您可能会从基于像素的热图获得更好的性能。

If you want to compute the buckets using D3, you can use d3.nest to group the data by day and by value. 如果要使用D3计算存储桶,可以使用d3.nest按天和按值对数据进行分组。 There's also d3.layout.histogram , but since you presumably want uniformly-spaced bins and the same bins for every day, d3.nest should be sufficient. 还有d3.layout.histogram ,但是因为你可能想要每天都有均匀间隔的垃圾箱和相同的垃圾桶,所以d3.nest应该足够了。

One subtle consideration: I placed the tick marks on the scale in-between tiles so as to indicate visually how the values are binned. 一个微妙的考虑因素:我将刻度线放在瓷砖之间的刻度上,以便直观地指示这些值是如何分箱的。 For example, the bottom-left bucket corresponds to all values between 800 and 900 on July 20 (where July 20 is the midnight-to-midnight interval); 例如,左下角的桶对应于7月20日800到900之间的所有值(其中7月20日是午夜到午夜的间隔); at least, that's what I assumed from looking at your data. 至少,这是我从查看你的数据时所假设的。 This is slightly clearer than labeling the middle of the rect because it indicates that the values are floored rather than rounded. 这比标记rect的中间部分要清楚得多,因为它表示值是浮动而不是舍入。

Here is one possible solution using R and ggplot2. 这是使用R和ggplot2的一种可能的解决方案。

Your data, ready to paste into R console: 您的数据已准备好粘贴到R控制台:

dat = structure(list(date = structure(c(15541, 15541, 15541, 15541, 
    15541, 15541, 15541, 15541, 15541, 15541, 15541, 15541, 15541, 
    15541, 15541, 15541, 15541, 15542, 15542, 15542, 15542, 15542, 
    15542, 15542, 15542, 15542, 15542, 15542, 15542, 15542, 15542, 
    15542, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 
    15543, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 
    15543, 15543, 15544, 15544, 15544, 15544, 15544, 15544, 15544, 
    15544, 15544, 15544, 15544, 15544, 15544, 15544, 15544, 15544, 
    15544, 15544, 15544, 15544, 15544, 15545, 15545, 15545, 15545, 
    15545, 15545, 15545, 15545, 15545, 15545, 15545, 15545, 15545, 
    15545, 15545, 15545, 15545, 15546, 15546, 15546, 15546, 15546, 
    15546, 15546, 15546, 15546, 15546, 15546, 15546, 15546, 15546, 
    15546, 15546, 15546, 15547, 15547, 15547, 15547, 15547, 15547, 
    15547, 15547, 15547, 15547, 15547, 15547, 15547, 15547, 15547, 
    15547, 15547, 15547, 15547), class = "Date"), bucket = c(800L, 
    900L, 1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 
    1800L, 1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 800L, 900L, 
    1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 1800L, 
    1900L, 2000L, 2100L, 2200L, 900L, 1000L, 1100L, 1200L, 1300L, 
    1400L, 1500L, 1600L, 1700L, 1800L, 1900L, 2000L, 2100L, 2200L, 
    2300L, 2400L, 2500L, 2600L, 2800L, 800L, 900L, 1000L, 1100L, 
    1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 1800L, 1900L, 2000L, 
    2100L, 2200L, 2300L, 2400L, 2500L, 2600L, 2700L, 2800L, 800L, 
    900L, 1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 
    1800L, 1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 800L, 900L, 
    1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 1800L, 
    1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 1300L, 1400L, 1500L, 
    1600L, 1700L, 1800L, 1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 
    2500L, 2600L, 2700L, 2800L, 2900L, 3000L, 3200L), cnt = c(119L, 
    123L, 173L, 226L, 284L, 257L, 268L, 244L, 191L, 204L, 187L, 177L, 
    164L, 125L, 140L, 109L, 103L, 123L, 165L, 237L, 278L, 338L, 306L, 
    316L, 269L, 271L, 241L, 188L, 174L, 158L, 153L, 132L, 154L, 241L, 
    246L, 300L, 305L, 301L, 292L, 253L, 251L, 214L, 189L, 179L, 159L, 
    161L, 144L, 139L, 132L, 136L, 105L, 120L, 156L, 209L, 267L, 299L, 
    316L, 318L, 307L, 295L, 273L, 283L, 229L, 192L, 193L, 170L, 164L, 
    154L, 138L, 101L, 115L, 103L, 105L, 156L, 220L, 255L, 308L, 338L, 
    318L, 255L, 278L, 260L, 235L, 230L, 185L, 145L, 147L, 157L, 109L, 
    104L, 191L, 201L, 238L, 223L, 229L, 286L, 256L, 240L, 233L, 202L, 
    180L, 184L, 161L, 125L, 110L, 101L, 132L, 117L, 124L, 154L, 167L, 
    137L, 169L, 175L, 168L, 188L, 137L, 173L, 164L, 167L, 115L, 116L, 
    118L, 125L, 104L)), .Names = c("date", "bucket", "cnt"), 
    class = "data.frame", row.names = c(NA, -125L))

Plotting code: 绘图代码:

library(ggplot2)

plot_1 = ggplot(dat, aes(x=date, y=bucket, fill=cnt)) +
         geom_tile() +
         scale_fill_continuous(low="#F7FBFF", high="#2171B5") +
         theme_bw()

ggsave("plot_1.png", plot_1, width=6, height=4)

在此输入图像描述 The plot might look better if you include rows for zero bucket values in your data. 如果在数据中包含零桶值的行,则绘图可能看起来更好。 Then you could change low="#F7FBFF" to low="white" . 然后你可以将low="#F7FBFF"改为low="white"

Put your numbers in a matrix and use 'image(mat)'? 将您的数字放在矩阵中并使用'image(mat)'? That looks to be all it is. 看起来就是这样。 A grid. 一个网格。 A raster. 一个栅格。 Or am I missing something? 或者我错过了什么?

There's also ways to do this with ggplot, raster, and probably others. 还有一些方法可以使用ggplot,raster和其他可能的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM