简体   繁体   English

如何修复多个观测值中缺少日期的时间序列?

[英]How to fix a time series with missing dates across multiple observations?

Let us consider the following time series with numbered days : 让我们考虑以下带有编号天数的时间序列:

test=data.table( day=sample(1:9, 15, TRUE), name=sort(rep(c("a", "b", "c"), 5)), value=sample(1:3, 15, TRUE) )
test[test[, !duplicated(day), by=name][,V1]][order(name, -day)]
    day name value
 1:   7    a     3
 2:   4    a     2
 3:   2    a     2
 4:   1    a     2
 5:   9    b     1
 6:   8    b     3
 7:   6    b     3
 8:   5    b     2
 9:   3    b     3
10:   7    c     1
11:   6    c     1
12:   4    c     1
13:   3    c     3
14:   1    c     2

As you can see we made some measurments on three objects a, b and c during 9 days. 如您所见,我们在9天之内对三个对象a, b and c进行了一些测量。 We would like to perform a day to day value comparison between the three objects, unfortunately some dates are randomly missing and this causes a problem to run an algorithm that would otherwise be straightforward. 我们希望在三个对象之间进行日常value比较,不幸的是,某些日期是随机丢失的,这会导致运行算法的问题,否则该算法将很简单。

I would like to inject rows into this datatable so all objects have the same days. 我想将行注入此数据表中,以便所有对象都拥有相同的日子。 Injected rows would default the value to 0 插入的行将默认value 0

All days available across all objects are listed with : 所有对象可用的所有天数以列出:

> sort(unique(test[,day]) )
[1] 1 2 3 4 5 6 7 8 9

So for instance the object a is missing days : 3, 5, 6, 8, 9 例如,对象a缺少天数: 3, 5, 6, 8, 9

After the row injection the datatable for a would look like : 行注入后, a如下所示:

test[name=="a"]
   day name value
1:   1    a     2
2:   2    a     1
3:   3    a     0
4:   4    a     3
5:   5    a     0
6:   6    a     0
7:   7    a     3
8:   8    a     0
9:   9    a     0

Any idea on how to tackle this problem ? 关于如何解决这个问题有什么想法吗? Maybe some libraries such as lubridate already know how to do that. 也许lubridate某些库已经知道该怎么做。

Using the data that you posted, which I copied and put into a data.table , you can do this using: 使用您发布的数据(我将其复制并放入data.table ,可以使用:

library(data.table)
## create a table with all days and names
all.dates <- setDT(expand.grid(day=sort(unique(test[,day])),name=sort(unique(test[,name]))))
## perform a left-outer-join of all.dates with test
setkey(all.dates)
setkey(test,day,name)
test <- test[all.dates]
## set those NA's to zero
test[is.na(test)] <- 0
##   day name value
##1    1    a     2
##2    1    b     0
##3    1    c     2
##4    2    a     2
##5    2    b     0
##6    2    c     0
##7    3    a     0
##8    3    b     3
##9    3    c     3
##10   4    a     2
##11   4    b     0
##12   4    c     1
##13   5    a     0
##14   5    b     2
##15   5    c     0
##16   6    a     0
##17   6    b     3
##18   6    c     1
##19   7    a     3
##20   7    b     0
##21   7    c     1
##22   8    a     0
##23   8    b     3
##24   8    c     0
##25   9    a     0
##26   9    b     1
##27   9    c     0

Data: 数据:

test <- structure(list(day = c(7L, 4L, 2L, 1L, 9L, 8L, 6L, 5L, 3L, 7L, 
6L, 4L, 3L, 1L), name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
    value = c(3L, 2L, 2L, 2L, 1L, 3L, 3L, 2L, 3L, 1L, 1L, 1L, 
    3L, 2L)), .Names = c("day", "name", "value"), class = c("data.table", 
"data.frame"), row.names = c(NA, -14L), .internal.selfref = <pointer: 0x102006778>)
 ##    day name value
 ## 1:   7    a     3
 ## 2:   4    a     2
 ## 3:   2    a     2
 ## 4:   1    a     2
 ## 5:   9    b     1
 ## 6:   8    b     3
 ## 7:   6    b     3
 ## 8:   5    b     2
 ## 9:   3    b     3
 ##10:   7    c     1
 ##11:   6    c     1
 ##12:   4    c     1
 ##13:   3    c     3
 ##14:   1    c     2

In the tidyverse , one of the packages ( tidyr ) has a wrapper over expand.grid and left.join . tidyverse ,其中一个软件包( tidyr )在expand.gridleft.join有一个包装。

library(tidyverse)
test$day <- factor(test$day, levels = 1:9)
test$name = factor(test$name, levels = c("a", "b", "c"))
test %>% 
    complete(day, name, fill = list(value = 0))
#> # A tibble: 32 × 3
#>       day   name value
#>    <fctr> <fctr> <dbl>
#> 1       1      a     0
#> 2       1      b     0
#> 3       1      c     0
#> 4       2      a     0
#> 5       2      b     0
#> 6       2      c     1
#> 7       3      a     1
#> 8       3      b     0
#> 9       3      c     0
#> 10      4      a     3
#> # ... with 22 more rows

You can also do it with expand.grid and a left join. 您也可以使用expand.grid和左expand.grid

possibilities = expand.grid(levels(test$day), unique(test$name))

possibilities %>%
    left_join(test, by = c("Var1" = "day", "Var2" = "name")) %>%
    mutate(value = ifelse(is.na(value), 0, value))
#>    Var1 Var2 value
#> 1     1    a     0
#> 2     2    a     0
#> 3     3    a     1
#> 4     4    a     3
#> 5     5    a     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM