简体   繁体   English

R中的累积访问时间序列plot

[英]Cumulative visit time series plot in R

I have a large dataset which logs id across the day at a specific location What I would like to do is plot the cumulative number of visits by each individual id over the time period the data were collected.我有一个大型数据集,它在特定位置记录一天中的id我想做的是 plot 在收集数据的时间段内每个个人id的累积访问次数。

A sample of the data looks like this, the full data set has visits over a number of days.数据样本如下所示,完整数据集已访问数天。 I've tried some variations using cumsum but just can't get it to work.我已经尝试了一些使用cumsum的变体,但无法让它工作。

 dput(df)
structure(list(date = c("06/01/2021", "06/01/2021", "06/01/2021", 
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", 
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", 
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", 
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", 
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", 
"06/01/2021", "06/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", 
"07/01/2021", "07/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", 
"08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", 
"08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", 
"08/01/2021"), time = c("08:02:54", "08:04:48", "08:04:49", "08:05:49", 
"08:05:50", "08:05:50", "08:05:51", "08:06:32", "08:06:33", "08:07:34", 
"08:07:34", "08:07:35", "08:07:36", "08:07:36", "08:09:52", "08:09:53", 
"08:09:53", "08:10:02", "08:10:04", "08:10:05", "08:10:05", "08:10:07", 
"08:10:08", "08:10:22", "08:10:42", "08:10:43", "08:11:14", "08:11:15", 
"08:11:38", "08:11:39", "08:11:39", "08:11:40", "08:11:40", "08:11:41", 
"08:11:48", "08:11:50", "08:11:51", "08:11:51", "08:11:52", "08:11:53", 
"08:11:54", "08:11:54", "08:12:36", "08:12:37", "08:12:38", "08:12:38", 
"08:13:25", "08:13:25", "08:14:09", "08:14:18", "08:14:19", "08:14:24", 
"08:14:24", "08:14:25", "08:14:37", "08:14:38", "08:14:58", "08:14:58", 
"08:14:59", "08:14:59", "08:15:03", "08:15:04", "08:15:04", "08:15:05", 
"08:15:12", "08:15:13", "08:15:13", "08:15:33", "08:15:34", "08:15:37", 
"08:15:39", "08:15:51", "08:16:12", "08:16:13", "08:16:14", "08:16:31", 
"08:16:32", "08:16:42", "08:17:00", "08:17:00", "08:17:01", "08:17:03", 
"08:17:19", "08:17:20", "08:17:22", "08:17:26", "08:17:26", "08:17:27", 
"08:17:27", "08:17:32", "08:17:32", "08:17:33", "08:17:50", "08:17:51", 
"08:17:51", "08:17:52", "08:18:38", "08:18:39", "08:18:39", "08:18:40", 
"08:18:41", "08:18:41", "08:19:44", "08:19:44", "08:19:46", "08:19:46", 
"08:22:27", "08:23:20", "08:23:20", "08:23:47", "08:23:48", "08:23:48", 
"08:23:52", "08:23:52"), id = c(2L, 3L, 2L, 3L, 4L, 5L, 3L, 4L, 
3L, 2L, 3L, 3L, 2L, 4L, 5L, 2L, 3L, 2L, 2L, 2L, 4L, 3L, 2L, 2L, 
4L, 5L, 3L, 2L, 4L, 5L, 3L, 3L, 4L, 5L, 6L, 4L, 3L, 5L, 4L, 5L, 
4L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 
5L, 3L, 2L, 4L, 5L, 3L, 2L, 2L, 4L, 5L, 3L, 3L, 4L, 5L, 6L, 4L, 
3L, 5L, 4L, 5L, 4L, 3L, 2L, 2L, 3L, 2L, 4L, 5L, 3L, 3L, 4L, 5L, 
6L, 4L, 3L, 5L, 4L, 5L, 4L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 
2L, 4L, 5L, 3L, 3L, 4L, 5L, 6L, 4L, 3L)), class = "data.frame", row.names = c(NA, 
-114L))
head(df)
        date     time id
1 06/01/2021 08:02:54  2
2 06/01/2021 08:04:48  3
3 06/01/2021 08:04:49  2
4 06/01/2021 08:05:49  3
5 06/01/2021 08:05:50  4
6 06/01/2021 08:05:50  5

Do you mean something like that?你的意思是这样的吗?

Using lubridate to convert your data in datetime object (easier to handle), then cumsum(!duplicated(datetime)) is used to count the number of (unique) visit by id .使用lubridate在日期时间 object 中转换您的数据(更易于处理),然后cumsum(!duplicated(datetime))来计算id的(唯一)访问次数。 Then it is plotted with ggplot2 .然后用ggplot2绘制。

The last line allows you to modify the x-axis breaks.最后一行允许您修改x-axis中断。

df %>%
  mutate(datetime = as_datetime(paste(as.Date(date, "%d/%m/%y"), time))) %>% 
  group_by(id) %>% 
  mutate(cumsum = cumsum(!duplicated(datetime))) %>% 
  ggplot(aes(x = datetime, y = cumsum, color = factor(id), group = id)) +
  geom_line() +
  scale_x_datetime(breaks = scales::date_breaks("1 day"), date_labels = "%D - %H:%M")

在此处输入图像描述

A ggplot() plot solution considering data as a factor variable for specific and for all time steps. ggplot() plot 解决方案将数据视为特定时间步长和所有时间步长的因子变量。

Cumulative visits by id and date : iddate的累计访问量:

library(data.table)
dt=as.data.table(df)
dd<-dt[ , count := .N, by = .(id, date)]
dd$date<-as.factor(dd$date)
   

Create the plot:创建 plot:

  ggplot(dd, aes(y=id, x=time, fill=count)) +  
      geom_tile() +
      scale_x_discrete(breaks = c("08:02:54","08:05:50", "08:07:34","08:10:02","08:13:25","08:16:32","08:19:44","08:23:52"))+ # remove this for all time-steps
      facet_wrap(~date)+
      scale_fill_gradient(low="lightyellow", high="red") + 
      labs(x="Time", y="Id", title="", fill="Number of visits") + 
      theme_bw()+
      theme(plot.title = element_text(hjust = 0.5,  face="bold", size=20, color="black")) + 
      theme(axis.title.x = element_text(family="Times", face="bold", size=16, color="black"))+
      theme(axis.title.y = element_text(family="Times", face="bold", size=16, color="black"))+
      theme(axis.text.x = element_text( hjust = 1,  face="bold", size=14, color="black", angle=90) )+
      theme(axis.text.y = element_text( hjust = 1,  face="bold", size=14, color="black") )+
      theme(plot.title = element_text(hjust = 0.5))+
      theme(legend.title = element_text(family="Times", color = "black", size = 16,face="bold"),
            legend.text = element_text(family="Times", color = "black", size = 14,face="bold"),
            legend.position="right",
            plot.title = element_text(hjust = 0.5))+
      theme(strip.text.x = element_text(size = 16, colour = "black",family="Times", face="bold"))
     

在此处输入图像描述

or without face_wrap()或没有face_wrap()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM