[英]Create missing observations in panel data
我正在研究具有唯一案例標識符的面板數據和觀察時間點的列(長格式)。 有時間常數變量和時變觀測:
id time tc1 obs1
1 101 1 male 4
2 101 2 male 5
3 101 3 male 3
4 102 1 female 6
5 102 3 female 2
6 103 1 male 2
對於我的模型,我現在需要每個時間點每個id都有完整記錄的數據。 換句話說,如果缺少觀察,我仍然需要為觀察到的變量插入id,時間,時間常數變量和NA(如線(102,2,“女性”,NA)在上面的例子中)。 所以我的問題是:
如果有人能夠對此有所了解,那將會很棒。
非常感謝提前!
謝謝大家的回復。 這是我最終做的,這是幾種建議方法的混合。 問題是我每行有幾個時變變量(obs1-obsn)而且我沒有得到dcast來容納它 - value.name不需要多於參數。
# create all possible permutations of id and year
iddat = expand.grid(id = unique(dataset$id), time = (c(1996,1999,2002,2005,2008,2011)))
iddat <- iddat[order(iddat$id, iddat$time), ]
# add permutations to existing data, combinations so far missing are NA
dataset_new <- merge(dataset, iddat, all.x=TRUE, all.y=TRUE, by=c("id", "time"))
# drop time-constant variables from data
dataset_new[c("tc1", "tc2", "tc3")] <- list(NULL)
# merge back time-constant variables from original data
temp <- dataset[c("tc1", "tc2", "tc3")]
dataset_new <- merge(dataset_new, temp, by=c("id"))
# sort
dataset_new <- dataset_new[order(dataset_new$id, dataset_new$time), ]
dataset_new <- unique(dataset_new) # some rows are duplicates after last merge, no idea why
rm(temp)
rm(iddat)
一切順利,再次感謝,馬特
可能有更優雅的方式,但這里有一個選擇。 我假設您需要id
和time
所有組合,但不需要tc1
(即tc1
與id
綁定)。
# your data
df <- read.table(text = " id time tc1 obs1
1 101 1 male 4
2 101 2 male 5
3 101 3 male 3
4 102 1 female 6
5 102 3 female 2
6 103 1 male 2", header = TRUE)
首先將數據轉換為寬格式以引入NA,然后轉換回long。
library('reshape2')
df_wide <- dcast(
df,
id + tc1 ~ time,
value.var = "obs1",
fill = NA
)
df_long <- melt(
df_wide,
id.vars = c("id","tc1"),
variable.name = "time",
value.name = "obs1"
)
# sort by id and then time
df_long[order(df_long$id, df_long$time), ]
id tc1 time obs1
1 101 male 1 4
4 101 male 2 5
7 101 male 3 3
2 102 female 1 6
5 102 female 2 NA
8 102 female 3 2
3 103 male 1 2
6 103 male 2 NA
9 103 male 3 NA
您可以創建一個空數據集,然后合並到您匹配的記錄中。
# Create dataset. For you actual data ,you would replace c(1:3) with
# c(1:max(yourdata$id)) and adjust the number of time periods to match your data.
id <- rep(c(1:3), each = 3)
time <- rep(c(1:3), 3)
df <- data.frame(id,time)
test <- df[c(1,3,5,7,9),]
test$tc1 <- c("male", "male", "female", "male", "male")
test$obs1 <-c(4,5,3,6,2)
merge(df, test, by.x = c("id","time"), by.y = c("id","time"), all.x = TRUE)
結果:
id time tc1 obs1
1 1 1 male 4
2 1 2 <NA> NA
3 1 3 male 5
4 2 1 <NA> NA
5 2 2 female 3
6 2 3 <NA> NA
7 3 1 male 6
8 3 2 <NA> NA
9 3 3 male 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.