[英]How to extract vectors of different lengths from large dataframe depending on multiple conditions in R
I have a data frame in R that consists of 3 columns. 我在R中有一个包含3列的数据框。 It looks a bit like this: 它看起来像这样:
x id trialNumber
1 1.4788 subj_01 trial010
2 1.4794 subj_01 trial010
3 1.4823 subj_01 trial010
4 1.4845 subj_01 trial010
5 1.4889 subj_01 trial010
6 1.4901 subj_01 trial010
...
20121 -1.3597 subj_03 trial042
20122 -1.3601 subj_03 trial042
20123 -1.3667 subj_03 trial042
20124 -1.3713 subj_03 trial042
20125 -1.3800 subj_03 trial042
20126 -1.3857 subj_03 trial042
I want to create a new data frame that consists of multiple columns for x; 我想创建一个新的数据框,其中包含x的多个列; where the columns are defined by id and trialNumber. 列由id和trialNumber定义。 The number of rows of each combination of id and trialNumber varies. id和trialNumber的每种组合的行数有所不同。 The number of rows in the new data frame should correspond to the largest number of rows of all the id and trialNumber combinations. 新数据框中的行数应与所有id和trialNumber组合中的最大行数相对应。 The result should look sth like this: 结果应该看起来像这样:
x1 x2 ... xi
1.4788 1.5678 ...
1.4794 1.5789 ...
1.4823 1.5984 ...
1.4845 ... ...
1.4889 NA ...
1.4901 NA -1.3713
... ... -1.3800
NA ... -1.3857
x1 to xi in the new data frame should correspond to each unique combination of id and trialNumber in the original data frame, eg x1 would correspond to all x where id == 'subj01' and trialNumber == 'trial010'. 新数据帧中的x1至xi应该对应于原始数据帧中id和trialNumber的每个唯一组合,例如x1将对应于所有x,其中id =='subj01'和trialNumber =='trial010'。
There are a lot of combinations of id and trialNumber, so I don't want to manually define the conditions by which to subset the original data frame. id和trialNumber的组合很多,所以我不想手动定义对原始数据帧进行子集化的条件。
You could try (a suggestion after reading the above comments): 您可以尝试(阅读以上评论后的建议):
tapply(df$x, paste0(df$id,df$trialNumber), function(x) data.frame(mean = mean(x), lower_limit = mean(x) - sd(x), upper_limit = mean(x) + sd(x)))
$subj_01trial010
mean lower_limit upper_limit
1 1.484871 1.479965 1.489778
$subj_03trial042
mean lower_limit upper_limit
1 -1.370583 -1.381177 -1.35999
Or using aggregate
you get a nicer outpur format: 或者使用aggregate
您会得到更好的输出格式:
aggregate(x ~ id + trialNumber, data = df, FUN = function(x) c(mean = mean(x), lower_limit = mean(x) - sd(x), upper_limit = mean(x) + sd(x)))
id trialNumber x.mean x.lower_limit x.upper_limit
1 subj_01 trial010 1.484871 1.479965 1.489778
2 subj_03 trial042 -1.370583 -1.381177 -1.359990
Here's an approach if you really want columns of x for each combination of trial and subject bound together: 如果您确实希望将试验和主题的每种组合的x列绑定在一起,则可以采用以下方法:
#step 1: create vector of x per combination
step1 <- split(dat2$x, list(dat2$trial,dat2$subject))
#calculate max length(to add padding)
max_length <- max(sapply(step1,length))
#make all vectors same length padded with NA
step2 <- lapply(step1, function(x){
length(x) <- max_length
x
})
#combine
res <- do.call(cbind,step2)
res
Code used for data generating: 用于生成数据的代码:
set.seed(100)
dat1 <-expand.grid(trial=sprintf("trial_%.03d",1:10),
subject= sprintf("subj_%.02d",1:3))
dat2 <- dat1[sample(nrow(dat1),1000,T),]
dat2$x <- rnorm(nrow(dat2))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.