简体   繁体   English

创建面板数据框

[英]Create a panel data frame

I would like to create a panel from a dataset that has one observation for every given time period such that every unit has a new observation for every time period. 我想从数据集创建一个面板,该面板在每个给定的时间段内都有一个观察点,这样每个单元对每个时间段都有一个新的观测值。 Using the following example: 使用以下示例:

id <- seq(1:4)
year <- c(2005, 2008, 2008, 2007)
y <- c(1,0,0,1)
frame <- data.frame(id, year, y)
frame

 id year y
1  1 2005 1
2  2 2008 0
3  3 2008 0
4  4 2007 1

For each unique ID, I would like there to be a unique observation for the year 2005, 2006, 2007, and 2008 (the lower and upper time periods on this frame), and set the outcome y to 0 for all the times in which there isn't an existing observation, such that the new frame looks like: 对于每个唯一ID,我希望对2005,2006,2007和2008年(此帧的上下时间段)有一个独特的观察,并将结果y设置为0,对于所有时间没有现有的观察,这样新的框架看起来像:

 id year y
1  1 2005 1
2  1 2006 0
3  1 2007 0
4  1 2008 0
....
13  4 2005 0
14  4 2006 0
15  4 2007 1
16  4 2008 0

I haven't had much success with loops; 我对循环没有太大的成功; Any and all thoughts would be greatly appreciated. 任何和所有的想法将不胜感激。

1) reshape2 Create a grid g of all years and id values crossed and rbind it with frame . 1)reshape2创建所有年份的网格g并交叉id值并用frame rbind

Then using the reshape2 package cast frame from long to wide form and then melt it back to long form. 然后使用reshape2包装cast frame从长到宽的形状,然后将其melt回长形。 Finally rearrange the rows and columns as desired. 最后根据需要重新排列行和列。

The lines ending in one # are only to ensure that every year is present so if we knew that were the case those lines could be omitted. 以#结尾的行只是为了确保每年都存在,所以如果我们知道这种情况可以省略。 The line ending in ## is only to rearrange the rows and columns so if that did not matter that line could be omitted too. 以##结尾的行只是重新排列行和列,所以如果无关紧要,也可以省略该行。

library(reshape2)

g <- with(frame, expand.grid(year = seq(min(year), max(year)), id = unique(id), y = 0)) #
frame <- rbind(frame, g) #

wide <- dcast(frame, year ~ id, fill = 0, fun = sum, value.var = "y")
long <- melt(wide, id = "year", variable.name = "id", value.name = "y")

long <- long[order(long$id, long$year), c("id", "year", "y")] ##

giving: 赠送:

> long
   id year y
1   1 2005 1
2   1 2006 0
3   1 2007 0
4   1 2008 0
5   2 2005 0
6   2 2006 0
7   2 2007 0
8   2 2008 0
9   3 2005 0
10  3 2006 0
11  3 2007 0
12  3 2008 0
13  4 2005 0
14  4 2006 0
15  4 2007 1
16  4 2008 0

2) aggregate A shorter solution would be to run just the two lines that end with # above and then follow those with an aggregate as shown. 2)聚合更短的解决方案是仅运行以#结尾的两条线,然后跟随具有aggregate ,如图所示。 This solution uses no addon packages. 此解决方案不使用插件包。

g <- with(frame, expand.grid(year = seq(min(year), max(year)), id = unique(id), y = 0)) #
frame <- rbind(frame, g) # 

aggregate(y ~ year + id, frame, sum)[c("id", "year", "y")]

This gives the same answer as solution (1) except as noted by a commenter solution (1) above makes id a factor whereas it is not in this solution. 这给出了与解决方案(1)相同的答案,除非上述评论者解决方案(1)指出使得id成为一个因素,而它不在该解决方案中。

Using data.table : 使用data.table

require(data.table)
DT <- data.table(frame, key=c("id", "year"))
comb <- CJ(1:4, 2005:2008) # like 'expand.grid', but faster + sets key
ans <- DT[comb][is.na(y), y:=0L] # perform a join (DT[comb]), then set NAs to 0
#     id year y
#  1:  1 2005 1
#  2:  1 2006 0
#  3:  1 2007 0
#  4:  1 2008 0
#  5:  2 2005 0
#  6:  2 2006 0
#  7:  2 2007 0
#  8:  2 2008 0
#  9:  3 2005 0
# 10:  3 2006 0
# 11:  3 2007 0
# 12:  3 2008 0
# 13:  4 2005 0
# 14:  4 2006 0
# 15:  4 2007 1
# 16:  4 2008 0

maybe not an elegant solution, but anyway: 也许不是一个优雅的解决方案,但无论如何:

df <- expand.grid(id=id, year=unique(year))
frame <- frame[frame$y != 0,]
df$y <- 0
df2 <- rbind(frame, df)
df2 <- df2[!duplicated(df2[,c("id", "year")]),]
df2 <- df2[order(df2$id, df2$year),]
rownames(df2) <- NULL
df2
# id year y
# 1   1 2005 1
# 2   1 2006 0
# 3   1 2007 0
# 4   1 2008 0
# 5   2 2005 0
# 6   2 2006 0
# 7   2 2007 0
# 8   2 2008 0
# 9   3 2005 0
# 10  3 2006 0
# 11  3 2007 0
# 12  3 2008 0
# 13  4 2005 0
# 14  4 2006 0
# 15  4 2007 1
# 16  4 2008 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM