简体   繁体   English

将多行追加到数据框

[英]append multiple rows to data frame

I have a dataframe with timepoints nested in persons (unequal rows per person and missings). 我有一个数据帧,其中的时间点嵌套在人员中(每个人的行数不相等且丢失)。 For each individual I want to add a new time point with NA's on all variables. 我想为每个人添加一个新的时间点,所有变量均带有NA。

Here is an example of my data: 这是我的数据的示例:

data_long      <- data.frame(id = factor(rep(1:3,each=4)), DV1 = c(1, 2, NA, 2), DV2 = c(2, 1, 2, 1), time = c(1989, 1995, 2003, 2010))
data_long$DV1       <- c(rnorm(12,0,1))
data_long$DV2       <- c(rnorm(12,0,1))
data_long$DV1[4]    <- NA
data_long$DV2[8]    <- NA
data_long[5,2:3]    <- NA
data_long[12,2:3]   <- NA
data_long       <- data_long[-9,]

T0 <- 1980 # new time point

This is what I want: 这就是我要的:

for (i in min(as.numeric(data_long$id)):max(as.numeric(data_long$id))){temp <- rbind(c(data_long[data_long$id == i,]$id[1], rep(NA,ncol(data_long[data_long$id == i,])-2), T0), data_long[data_long$id == i,])
write.table(temp, "test.dat", sep="\t", append=T, row.names=F, col.names=FALSE)}

data_long2 <- read.table("test.dat")

However, there must be a simpler way without actually saving the data in order to append differing numbers of rows. 但是,必须有一种更简单的方法而不实际保存数据以追加不同数量的行。 I apologize for this simple question and would be happy to be enlightened. 对于这个简单的问题,我深表歉意,很高兴得到启发。

This doesn't exactly match what you share as your desired output, but it does seem to better match what you describe: 这与您共享的期望输出不完全匹配,但似乎确实与您描述的内容更匹配:

Use expand.grid to create a data.frame to merge with your original data.frame . 使用expand.grid创建一个data.frame以与原始data.frame merge The "id" will be just the existing unique "id" values in your source data.frame , and the "time" value will have the new value appended to it. “ id”将只是源data.frame现有的唯一“ id”值,“ time”值将附加新值。

## set.seed(1) was used for this
X <- expand.grid(id = unique(data_long$id), 
                 time = c(1980, unique(data_long$time)))
merge(data_long, X, all.y = TRUE)
#    id time        DV1         DV2
# 1   1 1980         NA          NA
# 2   1 1989 -0.6264538 -0.62124058
# 3   1 1995  0.1836433 -2.21469989
# 4   1 2003 -0.8356286  1.12493092
# 5   1 2010         NA -0.04493361
# 6   2 1980         NA          NA
# 7   2 1989         NA          NA
# 8   2 1995 -0.8204684  0.94383621
# 9   2 2003  0.4874291  0.82122120
# 10  2 2010  0.7383247          NA
# 11  3 1980         NA          NA
# 12  3 1989         NA          NA  <---- This row is not there in your approach
# 13  3 1995 -0.3053884  0.78213630
# 14  3 2003  1.5117812  0.07456498
# 15  3 2010         NA          NA

One more approach 另一种方法

newrows <- data.frame(id=unique(data_long$id), DV1=NA, DV2=NA, time=T0)
res <- merge(newrows, data_long, all.x=T, all.y=T)
res <- res[with(res, order(id, time)), ]

The result is: 结果是:

> res
   id        DV1         DV2 time
5   1         NA          NA 1980
2   1 -0.6264538 -0.62124058 1989
3   1  0.1836433 -2.21469989 1995
1   1 -0.8356286  1.12493092 2003
4   1         NA -0.04493361 2010
9   2         NA          NA 1980
10  2         NA          NA 1989
6   2 -0.8204684  0.94383621 1995
7   2  0.4874291  0.82122120 2003
8   2  0.7383247          NA 2010
13  3         NA          NA 1980
11  3 -0.3053884  0.78213630 1995
12  3  1.5117812  0.07456498 2003
14  3         NA          NA 2010

Hope it helps, 希望能帮助到你,

alex 亚历克斯

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM