簡體   English   中英

在面板數據中創建缺少的觀察

[英]Create missing observations in panel data

我正在研究具有唯一案例標識符的面板數據和觀察時間點的列(長格式)。 有時間常數變量和時變觀測:

    id  time    tc1     obs1
1   101 1       male    4
2   101 2       male    5
3   101 3       male    3
4   102 1       female  6
5   102 3       female  2
6   103 1       male    2

對於我的模型,我現在需要每個時間點每個id都有完整記錄的數據。 換句話說,如果缺少觀察,我仍然需要為觀察到的變量插入id,時間,時間常數變量和NA(如線(102,2,“女性”,NA)在上面的例子中)。 所以我的問題是:

  1. 如何確定我的數據集中是否已存在具有唯一ID和時間組合的行?
  2. 如果沒有,我如何添加這一行,攜帶時間常數變量並用NA填充觀測值?

如果有人能夠對此有所了解,那將會很棒。

非常感謝提前!


編輯

謝謝大家的回復。 這是我最終做的,這是幾種建議方法的混合。 問題是我每行有幾個時變變量(obs1-obsn)而且我沒有得到dcast來容納它 - value.name不需要多於參數。

# create all possible permutations of id and year
iddat = expand.grid(id = unique(dataset$id), time = (c(1996,1999,2002,2005,2008,2011)))
iddat <- iddat[order(iddat$id, iddat$time), ]

# add permutations to existing data, combinations so far missing are NA
dataset_new <- merge(dataset, iddat, all.x=TRUE, all.y=TRUE, by=c("id", "time"))

# drop time-constant variables from data
dataset_new[c("tc1", "tc2", "tc3")] <- list(NULL)

# merge back time-constant variables from original data
temp <- dataset[c("tc1", "tc2", "tc3")]
dataset_new <- merge(dataset_new, temp, by=c("id"))

# sort
dataset_new <- dataset_new[order(dataset_new$id, dataset_new$time), ]
dataset_new <- unique(dataset_new) # some rows are duplicates after last merge, no idea why

rm(temp)
rm(iddat)

一切順利,再次感謝,馬特

可能有更優雅的方式,但這里有一個選擇。 我假設您需要idtime所有組合,但不需要tc1 (即tc1id綁定)。

# your data
df <- read.table(text = "    id  time    tc1     obs1
1   101 1       male    4
2   101 2       male    5
3   101 3       male    3
4   102 1       female  6
5   102 3       female  2
6   103 1       male    2", header = TRUE)

首先將數據轉換為寬格式以引入NA,然后轉換回long。

library('reshape2')

df_wide <- dcast(
  df, 
  id + tc1 ~ time,
  value.var = "obs1", 
  fill = NA
)

df_long <- melt(
  df_wide, 
  id.vars = c("id","tc1"), 
  variable.name = "time",
  value.name = "obs1"
)

# sort by id and then time
df_long[order(df_long$id, df_long$time), ]
   id    tc1 time obs1
1 101   male    1    4
4 101   male    2    5
7 101   male    3    3
2 102 female    1    6
5 102 female    2   NA
8 102 female    3    2
3 103   male    1    2
6 103   male    2   NA
9 103   male    3   NA

您可以創建一個空數據集,然后合並到您匹配的記錄中。

 # Create dataset.  For you actual data ,you would replace c(1:3) with 
 # c(1:max(yourdata$id)) and adjust the number of time periods to match your data.
 id <- rep(c(1:3), each = 3)
 time <- rep(c(1:3), 3)
 df <- data.frame(id,time)


 test <- df[c(1,3,5,7,9),]
 test$tc1 <- c("male", "male", "female", "male", "male")
 test$obs1 <-c(4,5,3,6,2)

 merge(df, test, by.x = c("id","time"), by.y = c("id","time"), all.x = TRUE)

結果:

 id time    tc1 obs1
 1  1    1   male    4
 2  1    2   <NA>   NA
 3  1    3   male    5
 4  2    1   <NA>   NA
 5  2    2 female    3
 6  2    3   <NA>   NA
 7  3    1   male    6
 8  3    2   <NA>   NA
 9  3    3   male    2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM