[英]Make a cumulative plot of events in R
I have a list of events and their times.我有一个事件列表和它们的时间。 I'm able to plot a histogram of them using
hist
, but I don't know how to make a cumulative plot of them.我可以使用
hist
绘制它们的直方图,但我不知道如何制作它们的累积图。
Here's the kind of data I'm starting with.这是我开始使用的数据类型。 (Assume it's already in
POSIXct
format) (假设它已经是
POSIXct
格式)
> events$time
[1] 2015-10-05 16:58:41.986797 2015-10-05 16:59:23.389583
[3] 2015-10-05 16:59:44.99402 2015-10-05 16:59:53.225178
[5] 2015-10-05 16:59:59.594524 2015-10-05 17:00:05.555564
[7] 2015-10-05 17:00:44.173783 2015-10-05 17:00:46.289552
[9] 2015-10-05 17:00:56.772485 2015-10-05 17:01:18.937458
[11] 2015-10-05 17:02:04.661378
and so on for ~8000 values
For instance, on my histogram, I have something like:例如,在我的直方图上,我有类似的东西:
2015-10-05 4:00: 20 events
2015-10-05 4:15: 30 events
2015-10-05 4:30: 11 events
I want to get a tally like:我想得到一个像:
2015-10-05 4:00: 20 events
2015-10-05 4:15: 50 events
2015-10-05 4:30: 61 events
How do I do that?我该怎么做?
A possible solution:一个可能的解决方案:
library(lubridate)
# example time data
time = c(
"2015-10-05 15:44:41.986797", "2015-10-05 15:59:23.389583", "2015-10-05 16:59:44.99402",
"2015-10-05 16:59:44.99402", "2015-10-05 16:59:44.99402", "2015-10-05 16:59:44.99402",
"2015-10-05 17:59:59.594524", "2015-10-05 17:59:59.594524", "2015-10-05 18:00:05.555564"
)
# transform time strings to POSIXct objects for count
time <- ymd_hms(time)
# count by second
event <- data.frame(table(time))
# transform time factors to POSIXct objects for df
event$time <- ymd_hms(event$time)
# find start and end time for 15min sequence
start <- round(min(event$time), "mins")
if (min(event$time) < start) {
minute(start) <- minute(start) - 1
}
while (minute(start) %% 15 != 0) {
minute(start) <- minute(start) - 1
}
end <- round(max(event$time), "mins")
if (max(event$time) > end) {
minute(end) <- minute(end) + 1
}
while (minute(end) %% 15 != 0) {
minute(end) <- minute(end) + 1
}
# create sequence and result data.frame
ft.seq <- seq(start, end, "15 mins")
ft.event <- data.frame(
start = ft.seq[1:(length(ft.seq)-1)],
end = ft.seq[2:(length(ft.seq))],
sum = 0
)
# ugly, nested loop to attribute values to 15min time slices
for (p1 in 1:nrow(ft.event)) {
for (p2 in 1:nrow(event)) {
if (event$time[p2] > ft.event$start[p1] &&
event$time[p2] < ft.event$end[p1]) {
ft.event$sum[p1] <- ft.event$sum[p1] + event$Freq[p2]
}
}
}
# cumsum
ft.event$cumsum <- cumsum(ft.event$sum)
# example plot
library(ggplot2)
ggplot(ft.event) +
geom_line(aes(x = end, y = cumsum))
This is an old post, but the given answer is very long.这是一个旧帖子,但给出的答案很长。
Use hist()
(as OP did) and then just use cumsum()
on the resulting object.使用
hist()
(就像 OP 一样),然后在结果对象上使用cumsum()
。
Careful about the start and end times in the hist
object注意
hist
对象中的开始和结束时间
library(tidyverse)
library(lubridate)
# example time data
time = c(
"2015-10-05 15:44:41.986797", "2015-10-05 15:59:23.389583", "2015-10-05 16:59:44.99402",
"2015-10-05 16:59:44.99402", "2015-10-05 16:59:44.99402", "2015-10-05 16:59:44.99402",
"2015-10-05 17:59:59.594524", "2015-10-05 17:59:59.594524", "2015-10-05 18:00:05.555564"
)
# transform time strings to POSIXct objects for count
time <- ymd_hms(time)
#Get start and end times
start_time=min(time) %>% floor_date("15 minutes")
end_time=max(time) %>% ceiling_date("15 minutes")
start_time
end_time
#get breaks for histogram
breaks=seq(start_time,end_time, by = as.difftime(minutes(15)))
#Create histogram
event_hist=hist(time,breaks, freq=T, plot=F)
#Organize results, calculate cumsum, all in a df
events_df=data.frame(start=as_datetime(event_hist$breaks[1:length(event_hist$breaks)-1], origin="1970-01-01 00:00:00"),
end=as_datetime(event_hist$breaks[2:length(event_hist$breaks)], origin="1970-01-01 00:00:00"),
count=event_hist$counts,
cumsum=cumsum(event_hist$counts))
## Now graph
library(ggplot2)
ggplot(events_df) +
geom_line(aes(x = end, y = cumsum))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.