[英]append multiple rows to data frame
I have a dataframe with timepoints nested in persons (unequal rows per person and missings). 我有一个数据帧,其中的时间点嵌套在人员中(每个人的行数不相等且丢失)。 For each individual I want to add a new time point with NA's on all variables.
我想为每个人添加一个新的时间点,所有变量均带有NA。
data_long <- data.frame(id = factor(rep(1:3,each=4)), DV1 = c(1, 2, NA, 2), DV2 = c(2, 1, 2, 1), time = c(1989, 1995, 2003, 2010))
data_long$DV1 <- c(rnorm(12,0,1))
data_long$DV2 <- c(rnorm(12,0,1))
data_long$DV1[4] <- NA
data_long$DV2[8] <- NA
data_long[5,2:3] <- NA
data_long[12,2:3] <- NA
data_long <- data_long[-9,]
T0 <- 1980 # new time point
for (i in min(as.numeric(data_long$id)):max(as.numeric(data_long$id))){temp <- rbind(c(data_long[data_long$id == i,]$id[1], rep(NA,ncol(data_long[data_long$id == i,])-2), T0), data_long[data_long$id == i,])
write.table(temp, "test.dat", sep="\t", append=T, row.names=F, col.names=FALSE)}
data_long2 <- read.table("test.dat")
However, there must be a simpler way without actually saving the data in order to append differing numbers of rows. 但是,必须有一种更简单的方法而不实际保存数据以追加不同数量的行。 I apologize for this simple question and would be happy to be enlightened.
对于这个简单的问题,我深表歉意,很高兴得到启发。
This doesn't exactly match what you share as your desired output, but it does seem to better match what you describe: 这与您共享的期望输出不完全匹配,但似乎确实与您描述的内容更匹配:
Use expand.grid
to create a data.frame
to merge
with your original data.frame
. 使用
expand.grid
创建一个data.frame
以与原始data.frame
merge
。 The "id" will be just the existing unique "id" values in your source data.frame
, and the "time" value will have the new value appended to it. “ id”将只是源
data.frame
现有的唯一“ id”值,“ time”值将附加新值。
## set.seed(1) was used for this
X <- expand.grid(id = unique(data_long$id),
time = c(1980, unique(data_long$time)))
merge(data_long, X, all.y = TRUE)
# id time DV1 DV2
# 1 1 1980 NA NA
# 2 1 1989 -0.6264538 -0.62124058
# 3 1 1995 0.1836433 -2.21469989
# 4 1 2003 -0.8356286 1.12493092
# 5 1 2010 NA -0.04493361
# 6 2 1980 NA NA
# 7 2 1989 NA NA
# 8 2 1995 -0.8204684 0.94383621
# 9 2 2003 0.4874291 0.82122120
# 10 2 2010 0.7383247 NA
# 11 3 1980 NA NA
# 12 3 1989 NA NA <---- This row is not there in your approach
# 13 3 1995 -0.3053884 0.78213630
# 14 3 2003 1.5117812 0.07456498
# 15 3 2010 NA NA
One more approach 另一种方法
newrows <- data.frame(id=unique(data_long$id), DV1=NA, DV2=NA, time=T0)
res <- merge(newrows, data_long, all.x=T, all.y=T)
res <- res[with(res, order(id, time)), ]
The result is: 结果是:
> res
id DV1 DV2 time
5 1 NA NA 1980
2 1 -0.6264538 -0.62124058 1989
3 1 0.1836433 -2.21469989 1995
1 1 -0.8356286 1.12493092 2003
4 1 NA -0.04493361 2010
9 2 NA NA 1980
10 2 NA NA 1989
6 2 -0.8204684 0.94383621 1995
7 2 0.4874291 0.82122120 2003
8 2 0.7383247 NA 2010
13 3 NA NA 1980
11 3 -0.3053884 0.78213630 1995
12 3 1.5117812 0.07456498 2003
14 3 NA NA 2010
Hope it helps, 希望能帮助到你,
alex 亚历克斯
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.