[英]Extract a date from dataframe and plot time series with R
I would like to plot a time series which presents the number of log
s hourly. 我想绘制呈现数的时间序列
log
Ş小时。 I tried at first to split the the date
of every log
from the dataframe
in order to count the number of log
s hourly. 我首先尝试从
dataframe
中拆分每个log
的date
,以便计算每小时的log
。
I have the following dataframe
: 我有以下
dataframe
:
[Fri Jun 1 15:56:37 1995] httpd: send aborted for disarray.demon.co.uk
[Fri Jun 1 16:29:29 1995] httpd: send aborted for ansc86024.usask.ca
[Fri Jun 1 16:31:42 1995] httpd: send aborted for 194.20.24.70
[Fri Jun 1 16:34:11 1995] httpd: send aborted for sw24-70.iol.it
[Fri Jun 1 16:41:02 1995] httpd: send aborted for educ026.usask.ca
[Fri Jun 1 16:41:13 1995] httpd: send aborted for educ026.usask.ca
[Fri Jun 1 16:41:13 1995] httpd: send aborted for sw24-70.iol.it
[Fri Jun 1 16:45:07 1995] httpd: send aborted for 128.233.18.38
[Fri Jun 1 17:26:50 1995] httpd: send aborted for pc117c.nwrel.org
[Fri Jun 1 17:46:53 1995] httpd: send aborted for geoff.usask.ca
[Fri Jun 2 17:57:09 1995] httpd: send aborted for piweba3y.prodigy.com
[Fri Jun 2 17:57:50 1995] httpd: send aborted for piweba3y.prodigy.com
[Fri Jun 2 18:10:15 1995] httpd: send aborted for 193.74.92.109
[Fri Jun 2 20:14:30 1995] httpd: send aborted for 128.233.13.41
[Fri Jun 2 20:15:59 1995] httpd: send aborted for peter.net4.io.org
[Fri Jun 2 21:11:54 1995] httpd: send aborted for ped374.usask.ca
I want to get the following plot with the number of log
s hourly : 我想得到以下图,其中包含每小时的
log
:
I tried to add the date
column using the gsub
function : 我试图使用
gsub
函数添加date
列:
df$date <- gsub(".+[(.*)]","",df[0])
How about this: 这个怎么样:
# Data in form of a string vector
dat = c("[Fri Jun 1 15:56:37 1995] httpd: send aborted for disarray.demon.co.uk",
"[Fri Jun 1 16:29:29 1995] httpd: send aborted for ansc86024.usask.ca",
"[Fri Jun 1 16:31:42 1995] httpd: send aborted for 194.20.24.70",
"[Fri Jun 1 16:34:11 1995] httpd: send aborted for sw24-70.iol.it",
"[Fri Jun 1 16:41:02 1995] httpd: send aborted for educ026.usask.ca",
"[Fri Jun 1 16:41:13 1995] httpd: send aborted for educ026.usask.ca",
"[Fri Jun 1 16:41:13 1995] httpd: send aborted for sw24-70.iol.it",
"[Fri Jun 1 16:45:07 1995] httpd: send aborted for 128.233.18.38",
"[Fri Jun 1 17:26:50 1995] httpd: send aborted for pc117c.nwrel.org",
"[Fri Jun 1 17:46:53 1995] httpd: send aborted for geoff.usask.ca",
"[Fri Jun 2 17:57:09 1995] httpd: send aborted for piweba3y.prodigy.com",
"[Fri Jun 2 17:57:50 1995] httpd: send aborted for piweba3y.prodigy.com",
"[Fri Jun 2 18:10:15 1995] httpd: send aborted for 193.74.92.109",
"[Fri Jun 2 20:14:30 1995] httpd: send aborted for 128.233.13.41",
"[Fri Jun 2 20:15:59 1995] httpd: send aborted for peter.net4.io.org",
"[Fri Jun 2 21:11:54 1995] httpd: send aborted for ped374.usask.ca")
library(dplyr)
library(lubridate)
Extract date string: 提取日期字符串:
dat = data.frame(date.string = gsub(".{5}(.*)\\].*", "\\1", dat))
Convert date string to POSIXct datetime format: 将日期字符串转换为POSIXct datetime格式:
dat$date = as.POSIXct(dat$date.string, format= "%b %e %H:%M:%S %Y")
Now, summarise by hour. 现在,按小时汇总。 We throw away the minutes and seconds so that we can then just group by date to get counts by hour:
我们丢掉了分钟和秒,这样我们就可以按日期分组以获取小时计数:
datByHour = dat %>%
mutate(date = as.POSIXct(paste0(paste(year(date),month(date),day(date),sep="-"),
" ",
paste(hour(date),"00:00", sep=":")))) %>%
group_by(date) %>%
tally
datByHour
date n 1 1995-06-01 15:00:00 1 2 1995-06-01 16:00:00 7 3 1995-06-01 17:00:00 2 4 1995-06-02 17:00:00 2 5 1995-06-02 18:00:00 1 6 1995-06-02 20:00:00 2 7 1995-06-02 21:00:00 1
Plot hourly counts: 绘制小时计数:
ggplot(datByHour, aes(date, n)) +
geom_line(aes(group=1)) +
scale_x_datetime(date_labels="%b %e, %Y: %H")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.