R - How to add rows for missing values for unique group sequences?

Question

My problem is similar to this previous question Fastest way to add rows for missing values in a data.frame?

I can't figure out how to add rows padded with "NA" when the min/max is different by group.

> red<-data.frame(project = c(6, 6, 6, 6, 6, 9, 9, 9), period =c(1, 2, 5:7, 2, 4, 5), v3=letters[1:8], v4=c("red", "yellow", recursive = T))
> red
  project period v3     v4
1       6      1  a    red
2       6      2  b yellow 
3       6      5  c    red
4       6      6  d yellow
5       6      7  e    red
6       9      2  f yellow
7       9      4  g    red
8       9      5  h yellow

I want it to look like:

project period v3     v4
      6      1  a    red
      6      2  b yellow
      6      3 NA     NA
      6      4 NA     NA
      6      5  c    red
      6      6  d yellow
      6      7  e    red
      9      2  f yellow
      9      3 NA     NA
      9      4  g    red
      9      5  h yellow

When I used

library(data.table)
DT=as.data.table(red)
setkey(DT, project, period)

DT[CJ(unique(project), seq(min(period), max(period)))]

it made each project group have 7 periods; Project 6 should have periods 1-7, but Project 9 should have periods 2-5.

I've tried fiddling with .SD[ which.max(period)], by=project] but no cigar.

I thought it should be something simple in the seq(), but I tried seq(min(period, by=project)) with no luck

Thank you!

Answer 1

DT[setkey(DT[, .(min(period):max(period)), by = project], project, V1)]
#    project period v3     v4
# 1:       6      1  a    red
# 2:       6      2  b yellow
# 3:       6      3 NA     NA
# 4:       6      4 NA     NA
# 5:       6      5  c    red
# 6:       6      6  d yellow
# 7:       6      7  e    red
# 8:       9      2  f yellow
# 9:       9      3 NA     NA
#10:       9      4  g    red
#11:       9      5  h yellow

Answer 2

I don't know if this the idiomatic way or not, but I was able to achieve your desired output, by first creating an index and then subsetting the correct rows out of .SD per that index

DT[, indx := .GRP, project][, 
     .SD[CJ(unique(project), seq(min(period), max(period)))], indx]

#     indx project period v3     v4
#  1:    1       6      1  a    red
#  2:    1       6      2  b yellow
#  3:    1       6      3 NA     NA
#  4:    1       6      4 NA     NA
#  5:    1       6      5  c    red
#  6:    1       6      6  d yellow
#  7:    1       6      7  e    red
#  8:    2       9      2  f yellow
#  9:    2       9      3 NA     NA
# 10:    2       9      4  g    red
# 11:    2       9      5  h yellow

Answer 3

The accepted answer does not work (anymore?), but it is close.

setkey(DT,project,period)
DT[setkey(DT[, .(min(period):max(period)), by = project], project, V1)]

Note: 1. you need to make the period sequence into the list to work. 2. @MiamiCG, I am guessing you needed to allow cartesian because of not keying the table first. If you set it to TRUE, there will be no error message, but the result will not be correct.

Update: @eddi has updated his answer to match mine, so it is working.

R - How to add rows for missing values for unique group sequences?

Question

3 answers

solution1
2 ACCPTED 2015-01-21 18:41:08

solution2
2 2015-01-21 18:41:19

solution3
2 2017-06-21 21:38:58

R - How to add rows for missing values for unique group sequences?

Question

3 answers

solution1 2 ACCPTED 2015-01-21 18:41:08

solution2 2 2015-01-21 18:41:19

solution3 2 2017-06-21 21:38:58

solution1
2 ACCPTED 2015-01-21 18:41:08

solution2
2 2015-01-21 18:41:19

solution3
2 2017-06-21 21:38:58