对于每个ID，将R中缺少的每周数据点填写为0

Question

I have data in this shape: 我有这种形状的数据：

> head(posts)
     id    week_number num_posts
1 UKL1.1           1         4
2 UKL1.1           6         9
3 UKL1.2           1         2
4 UKL1.3           1         8
5 UKL1.3           2         7
6 UKL1.3           3         3

and I want to make it such that each id has a row for each week_number (1,2,3,4,5,6) and if that week_number isn't already in the data then posts should = 0 我想使每个id在每个week_number （1,2,3,4,5,6）中都有一行，如果数据中还没有那个week_number ，那么posts应该= 0

I've seen this done using the package zoo with true time-series data, but without creating a proper POSIXct or Date version of week_number and using that package is there a way to do this directly? 我已经看到了这个用包做zoo与真正的时间序列数据，而无需创建一个适当的POSIXct或Date的版本week_number ，并使用该包装是有没有办法直接做到这一点？

Answer 1

Here's a way using data.table . 这是使用data.table的方法。

library(data.table)
setDT(posts)                           # convert posts to a data.table
all.wks <- posts[,list(week_number=min(week_number):max(week_number)),by=id]
setkey(posts,id,week_number)           # index on id and week number
setkey(all.wks,id,week_number)         # index on id and week number
result <- posts[all.wks]               # data.table join is very fast
result[is.na(num_posts),num_posts:=0]  # convert NA to 0
result
#         id week_number num_posts
#  1: UKL1.1           1         4
#  2: UKL1.1           2         0
#  3: UKL1.1           3         0
#  4: UKL1.1           4         0
#  5: UKL1.1           5         0
#  6: UKL1.1           6         9
#  7: UKL1.2           1         2
#  8: UKL1.3           1         8
#  9: UKL1.3           2         7
# 10: UKL1.3           3         3

Another way: 其他方式：

my_fun <- function(x) {
    weeks = with(x, min(week_number):max(week_number))
    posts = with(x, num_posts[match(weeks, week_number)])
    list(week_number=weeks, num_posts=posts)
}
setDT(posts)[, my_fun(.SD), by=id]

.SD means subset of data; .SD表示数据的子集； it contains the data subset corresponding to each group specified in by , with all columns excluding the grouping column = id . 它包含对应于规定的各组的子集的数据by ，与不包括分组列=所有的列id 。

Then you can replace NA s as shown above. 然后，您可以如上所述替换NA 。

对于每个ID，将R中缺少的每周数据点填写为0

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-11-07 19:18:14

对于每个ID，将R中缺少的每周数据点填写为0

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-11-07 19:18:14

解决方案1
1 已采纳 2014-11-07 19:18:14