簡體   English   中英

從數據框中提取日期並使用R繪制時間序列

[英]Extract a date from dataframe and plot time series with R

我想繪制呈現數的時間序列log Ş小時。 我首先嘗試從dataframe中拆分每個logdate ,以便計算每小時的log

我有以下dataframe

[Fri Jun  1 15:56:37 1995] httpd: send aborted for disarray.demon.co.uk
[Fri Jun  1 16:29:29 1995] httpd: send aborted for ansc86024.usask.ca
[Fri Jun  1 16:31:42 1995] httpd: send aborted for 194.20.24.70
[Fri Jun  1 16:34:11 1995] httpd: send aborted for sw24-70.iol.it
[Fri Jun  1 16:41:02 1995] httpd: send aborted for educ026.usask.ca
[Fri Jun  1 16:41:13 1995] httpd: send aborted for educ026.usask.ca
[Fri Jun  1 16:41:13 1995] httpd: send aborted for sw24-70.iol.it
[Fri Jun  1 16:45:07 1995] httpd: send aborted for 128.233.18.38
[Fri Jun  1 17:26:50 1995] httpd: send aborted for pc117c.nwrel.org
[Fri Jun  1 17:46:53 1995] httpd: send aborted for geoff.usask.ca
[Fri Jun  2 17:57:09 1995] httpd: send aborted for piweba3y.prodigy.com
[Fri Jun  2 17:57:50 1995] httpd: send aborted for piweba3y.prodigy.com
[Fri Jun  2 18:10:15 1995] httpd: send aborted for 193.74.92.109
[Fri Jun  2 20:14:30 1995] httpd: send aborted for 128.233.13.41
[Fri Jun  2 20:15:59 1995] httpd: send aborted for peter.net4.io.org
[Fri Jun  2 21:11:54 1995] httpd: send aborted for ped374.usask.ca

我想得到以下圖,其中包含每小時的log

在此處輸入圖片說明

我試圖使用gsub函數添加date列:

df$date <- gsub(".+[(.*)]","",df[0])

這個怎么樣:

# Data in form of a string vector
dat = c("[Fri Jun 1 15:56:37 1995] httpd: send aborted for disarray.demon.co.uk", 
        "[Fri Jun 1 16:29:29 1995] httpd: send aborted for ansc86024.usask.ca", 
        "[Fri Jun 1 16:31:42 1995] httpd: send aborted for 194.20.24.70", 
        "[Fri Jun 1 16:34:11 1995] httpd: send aborted for sw24-70.iol.it", 
        "[Fri Jun 1 16:41:02 1995] httpd: send aborted for educ026.usask.ca", 
        "[Fri Jun 1 16:41:13 1995] httpd: send aborted for educ026.usask.ca", 
        "[Fri Jun 1 16:41:13 1995] httpd: send aborted for sw24-70.iol.it", 
        "[Fri Jun 1 16:45:07 1995] httpd: send aborted for 128.233.18.38", 
        "[Fri Jun 1 17:26:50 1995] httpd: send aborted for pc117c.nwrel.org", 
        "[Fri Jun 1 17:46:53 1995] httpd: send aborted for geoff.usask.ca", 
        "[Fri Jun 2 17:57:09 1995] httpd: send aborted for piweba3y.prodigy.com", 
        "[Fri Jun 2 17:57:50 1995] httpd: send aborted for piweba3y.prodigy.com", 
        "[Fri Jun 2 18:10:15 1995] httpd: send aborted for 193.74.92.109", 
        "[Fri Jun 2 20:14:30 1995] httpd: send aborted for 128.233.13.41", 
        "[Fri Jun 2 20:15:59 1995] httpd: send aborted for peter.net4.io.org", 
        "[Fri Jun 2 21:11:54 1995] httpd: send aborted for ped374.usask.ca")

library(dplyr)
library(lubridate)

提取日期字符串:

dat = data.frame(date.string = gsub(".{5}(.*)\\].*", "\\1", dat))

將日期字符串轉換為POSIXct datetime格式:

dat$date = as.POSIXct(dat$date.string, format= "%b %e %H:%M:%S %Y")

現在,按小時匯總。 我們丟掉了分鍾和秒,這樣我們就可以按日期分組以獲取小時計數:

datByHour = dat %>% 
  mutate(date = as.POSIXct(paste0(paste(year(date),month(date),day(date),sep="-"), 
                                  " ", 
                                  paste(hour(date),"00:00", sep=":")))) %>%
  group_by(date) %>%
  tally 

datByHour
  date n 1 1995-06-01 15:00:00 1 2 1995-06-01 16:00:00 7 3 1995-06-01 17:00:00 2 4 1995-06-02 17:00:00 2 5 1995-06-02 18:00:00 1 6 1995-06-02 20:00:00 2 7 1995-06-02 21:00:00 1 

繪制小時計數:

ggplot(datByHour, aes(date, n)) + 
  geom_line(aes(group=1)) +
  scale_x_datetime(date_labels="%b %e, %Y: %H")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM