简体   繁体   English

在R中绘制略微混乱的时间序列数据

[英]Plotting slightly disorganised Time Series Data in R

Sorry, this might have an obvious answer but I'm a little unsure what to do for it. 对不起,这可能有一个明显的答案,但我有点不确定该怎么做。

Say for instance I have a dataset where I have a list of names of people, the number of sales they made, and the dates they made those sales, all in the following format: 比方说,我有一个数据集,其中我有一个人员名单,他们的销售数量,以及他们进行销售的日期,所有这些都采用以下格式:

Name    |    Date    |     Sales
------------------------------------
AAA     | 01/01/2001 |     50
AAA     | 01/02/2001 |     62
AAA     | 01/03/2001 |     73
...     |    ...     |     ...
AAA     | 05/15/2001 |     20
BBB     | 02/06/2001 |     51
BBB     | 02/09/2001 |     45
...     |    ...     |     ...
BBB     | 04/13/2001 |     3
CCC     | 01/22/2001 |     78
...     |    ...     |     ...
...     |    ...     |     ...

Basically, my data looks kinda like how it is above - there are multiple different names, and also the dates for each name do not align properly (eg one person may start much earlier in the year compared to another person and therefore has sales data much earlier in the year). 基本上,我的数据看起来有点像它的上面 - 有多个不同的名称,并且每个名称的日期也没有正确对齐(例如,一个人可能在一年中比其他人更早开始,因此有很多销售数据今年早些时候)。 In addition to that, the dates may skip forward a bit, where we may have a date 4/3/2001 and it then may move forward to 4/25/2001 in the next cell. 除此之外,日期可能稍微向前跳,我们可能会有一个日期4/3/2001,然后可能会在下一个单元格中向前移动到4/25/2001。

What I would like to do now is plot the data for the whole year such that I have all the different people (ie AAA, BBB, CCC,...) and all the sales they made along with the dates they made those sales all in one big plot. 我现在要做的是绘制全年的数据,以便我拥有所有不同的人(即AAA,BBB,CCC,......)以及他们所做的所有销售以及他们所做的所有销售日期在一个大的情节。

Now, I can think of one way to do this - by first using the subset() function and subsetting the dataset by name, I may be able to plot the data in this way. 现在,我可以想到一种方法 - 首先使用subset()函数并按名称对数据集进行子集化,我可以用这种方式绘制数据。 The problem is is that I find this to be a bit inefficient, and I'm also sure that R must have far better ways to plot time series data even if the data is a little bit weird. 问题是我觉得这样效率有点低,而且我也确信R必须有更好的方法来绘制时间序列数据,即使数据有点奇怪。 If anyone has some suggestions or could provide a bit of help then I'd appreciate it, thanks in advance. 如果有人有一些建议或可以提供一些帮助,那么我会很感激,谢谢你提前。

Are you looking for something like this? 你在找这样的东西吗?

library(dplyr)
library(tidyr)
library(ggplot2)
#Create data.frame
Date <- as.Date(c(seq(as.Date("2001-01-03"), as.Date("2001-10-17"), by = 1), 
                  seq(as.Date("2001-05-10"), as.Date("2001-12-17"), by = 1),  
                  seq(as.Date("2001-04-12"), as.Date("2001-11-17"), by = 1)))
Name <- c(rep("AAA", 288), rep("BBB", 222), rep("CCC", 220))
Sales <- c(sample(10:20, 288, replace = T), sample(50:60, 222, replace = T), sample(80:90, 220, replace = T))
df <- data.frame(Name, Date, Sales)

#select specific rows(dates) to create irregular time series (missing dates)
df1 <- df[c(1:50, 100:150, 190:288, 289:370, 400:450, 480:510, 511:640, 670:730),] %>% 
  tidyr::spread(Name, Sales) 

#create a data.frame (df_whole_yr) that have continuous dates for whole 2001 
df_whole_yr <- data.frame(Date = seq(as.Date("2001-01-01"), as.Date("2001-12-31"), by = 1)) %>% 
  dplyr::left_join(., df1, by ="Date") %>% #join irregular timeseries df1 with the continuous timeseries df_whole_yr
  tidyr::gather("Name", "Sales", 2:4) %>% #convert it to long format
  ggplot(., aes(x =Date, y = Sales, color = Name))+ ##plot
    geom_line(size = 0.2)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM