R：有值的时间序列

Question

我有一个带有日期和大小（文件数）的日志文件。 我想绘制每1分钟和每5分钟使用的带宽。 输入看起来像这样：

2014-08-08 06:37:34.610    639205638
2014-08-08 06:37:37.110    239205638
2014-08-08 06:38:58.810    635899318
2014-08-08 06:38:21.877   1420094614
2014-08-08 06:40:11.772    140034211

因此，我需要按日期将值归类为1分钟和5分钟的bin，对每个bin求和，然后根据最小数量对它们进行平均，然后根据时间绘制它们。

但是我觉得以前已经做过，可以使用通用绘图功能。

Answer 1

您可以使用xts轻松完成此操作。

# read in the data
x <- read.table(text="2014-08-08 06:37:34.610    639205638
2014-08-08 06:37:37.110    239205638
2014-08-08 06:38:58.810    635899318
2014-08-08 06:38:21.877   1420094614
2014-08-08 06:40:11.772    140034211", stringsAsFactors=FALSE)

# convert to xts
xx <- xts(x[, 3], as.POSIXct(paste(x[,1], x[, 2])))

# find the 1 minute and 5 minute endpoints
ep1 <- endpoints(xx, "minutes", 1)
ep5 <- endpoints(xx, "minutes", 5)

period.sum(xx, ep1) # 1 minute sums
period.sum(xx, ep5) # 5 minute sums

更一般（但较慢）：

period.apply(xx, ep1, sum)

对于问题的最后一部分，请取这些结果的平均值

mean(period.sum(xx, ep1))
#[1] 1024813140

Answer 2

目前尚不清楚“按分钟数平均”的含义，但忽略了这一点，它按1分钟和5分钟对数据进行分箱并绘制了分箱。 请注意，我们已指定数据为"numeric"以避免整数溢出。 如果希望它们在单独的面板中显示，则忽略facet = NULL ：

library(zoo)
library(ggplot2)    
library(scales)

# read data from character variable Lines; Lines shown after graph
z <- read.zoo(text = Lines, index = 1:2, tz = "",
          colClasses = c(NA, NA, "numeric"))

ag1 <- aggregate(z, as.POSIXct(cut(time(z), "min")), sum)
ag5 <- aggregate(z, as.POSIXct(cut(time(z), "5 min")), sum)

autoplot(na.approx(cbind(ag1, ag5)), facet = NULL) + 
   scale_x_datetime(breaks = "1 min", labels = date_format("%H:%M"))

Here is `Lines` :

Lines <- "2014-08-08 06:37:34.610    639205638
2014-08-08 06:37:37.110    239205638
2014-08-08 06:38:58.810    635899318
2014-08-08 06:38:21.877   1420094614
2014-08-08 06:45:11.772    140034211"

R：有值的时间序列

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-10-13 13:13:16

解决方案2
0 2014-10-13 14:44:49

R：有值的时间序列

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-10-13 13:13:16

解决方案2 0 2014-10-13 14:44:49

解决方案1
1 已采纳 2014-10-13 13:13:16

解决方案2
0 2014-10-13 14:44:49