My problem is similar to this previous question Fastest way to add rows for missing values in a data.frame?
I can't figure out how to add rows padded with "NA" when the min/max is different by group.
> red<-data.frame(project = c(6, 6, 6, 6, 6, 9, 9, 9), period =c(1, 2, 5:7, 2, 4, 5), v3=letters[1:8], v4=c("red", "yellow", recursive = T))
> red
project period v3 v4
1 6 1 a red
2 6 2 b yellow
3 6 5 c red
4 6 6 d yellow
5 6 7 e red
6 9 2 f yellow
7 9 4 g red
8 9 5 h yellow
I want it to look like:
project period v3 v4
6 1 a red
6 2 b yellow
6 3 NA NA
6 4 NA NA
6 5 c red
6 6 d yellow
6 7 e red
9 2 f yellow
9 3 NA NA
9 4 g red
9 5 h yellow
When I used
library(data.table)
DT=as.data.table(red)
setkey(DT, project, period)
DT[CJ(unique(project), seq(min(period), max(period)))]
it made each project group have 7 periods; Project 6 should have periods 1-7, but Project 9 should have periods 2-5.
I've tried fiddling with .SD[ which.max(period)], by=project]
but no cigar.
I thought it should be something simple in the seq(), but I tried seq(min(period, by=project))
with no luck
Thank you!
DT[setkey(DT[, .(min(period):max(period)), by = project], project, V1)]
# project period v3 v4
# 1: 6 1 a red
# 2: 6 2 b yellow
# 3: 6 3 NA NA
# 4: 6 4 NA NA
# 5: 6 5 c red
# 6: 6 6 d yellow
# 7: 6 7 e red
# 8: 9 2 f yellow
# 9: 9 3 NA NA
#10: 9 4 g red
#11: 9 5 h yellow
I don't know if this the idiomatic way or not, but I was able to achieve your desired output, by first creating an index and then subsetting the correct rows out of .SD
per that index
DT[, indx := .GRP, project][,
.SD[CJ(unique(project), seq(min(period), max(period)))], indx]
# indx project period v3 v4
# 1: 1 6 1 a red
# 2: 1 6 2 b yellow
# 3: 1 6 3 NA NA
# 4: 1 6 4 NA NA
# 5: 1 6 5 c red
# 6: 1 6 6 d yellow
# 7: 1 6 7 e red
# 8: 2 9 2 f yellow
# 9: 2 9 3 NA NA
# 10: 2 9 4 g red
# 11: 2 9 5 h yellow
The accepted answer does not work (anymore?), but it is close.
setkey(DT,project,period)
DT[setkey(DT[, .(min(period):max(period)), by = project], project, V1)]
Note: 1. you need to make the period sequence into the list to work. 2. @MiamiCG, I am guessing you needed to allow cartesian because of not keying the table first. If you set it to TRUE, there will be no error message, but the result will not be correct.
Update: @eddi has updated his answer to match mine, so it is working.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.