如何根据R中的多个条件从大数据帧中提取不同长度的向量

Question

I have a data frame in R that consists of 3 columns. 我在R中有一个包含3列的数据框。 It looks a bit like this: 它看起来像这样：

  x      id trialNumber
1 1.4788 subj_01    trial010
2 1.4794 subj_01    trial010
3 1.4823 subj_01    trial010
4 1.4845 subj_01    trial010
5 1.4889 subj_01    trial010
6 1.4901 subj_01    trial010
...
20121 -1.3597 subj_03    trial042
20122 -1.3601 subj_03    trial042
20123 -1.3667 subj_03    trial042
20124 -1.3713 subj_03    trial042
20125 -1.3800 subj_03    trial042
20126 -1.3857 subj_03    trial042

I want to create a new data frame that consists of multiple columns for x; 我想创建一个新的数据框，其中包含x的多个列； where the columns are defined by id and trialNumber. 列由id和trialNumber定义。 The number of rows of each combination of id and trialNumber varies. id和trialNumber的每种组合的行数有所不同。 The number of rows in the new data frame should correspond to the largest number of rows of all the id and trialNumber combinations. 新数据框中的行数应与所有id和trialNumber组合中的最大行数相对应。 The result should look sth like this: 结果应该看起来像这样：

x1      x2   ... xi
1.4788  1.5678  ...
1.4794  1.5789  ...
1.4823  1.5984  ...
1.4845  ...     ...
1.4889  NA      ...
1.4901  NA      -1.3713
...     ...     -1.3800
NA      ...     -1.3857

x1 to xi in the new data frame should correspond to each unique combination of id and trialNumber in the original data frame, eg x1 would correspond to all x where id == 'subj01' and trialNumber == 'trial010'. 新数据帧中的x1至xi应该对应于原始数据帧中id和trialNumber的每个唯一组合，例如x1将对应于所有x，其中id =='subj01'和trialNumber =='trial010'。

There are a lot of combinations of id and trialNumber, so I don't want to manually define the conditions by which to subset the original data frame. id和trialNumber的组合很多，所以我不想手动定义对原始数据帧进行子集化的条件。

Answer 1

You could try (a suggestion after reading the above comments): 您可以尝试（阅读以上评论后的建议）：

tapply(df$x, paste0(df$id,df$trialNumber), function(x) data.frame(mean = mean(x), lower_limit = mean(x) - sd(x), upper_limit = mean(x) + sd(x)))
$subj_01trial010
      mean lower_limit upper_limit
1 1.484871    1.479965    1.489778

$subj_03trial042
       mean lower_limit upper_limit
1 -1.370583   -1.381177    -1.35999

Or using aggregate you get a nicer outpur format: 或者使用aggregate您会得到更好的输出格式：

aggregate(x ~ id + trialNumber, data = df, FUN = function(x) c(mean = mean(x), lower_limit = mean(x) - sd(x), upper_limit = mean(x) + sd(x)))
       id trialNumber    x.mean x.lower_limit x.upper_limit
1 subj_01    trial010  1.484871      1.479965      1.489778
2 subj_03    trial042 -1.370583     -1.381177     -1.359990

Answer 2

Here's an approach if you really want columns of x for each combination of trial and subject bound together: 如果您确实希望将试验和主题的每种组合的x列绑定在一起，则可以采用以下方法：

#step 1: create vector of x per combination

step1 <- split(dat2$x, list(dat2$trial,dat2$subject))

#calculate max length(to add padding)
max_length <- max(sapply(step1,length))

#make all vectors same length padded with NA
step2 <- lapply(step1, function(x){
  length(x) <- max_length
  x
})

#combine

res <- do.call(cbind,step2)
res

Code used for data generating: 用于生成数据的代码：

set.seed(100)

dat1 <-expand.grid(trial=sprintf("trial_%.03d",1:10), 
                   subject= sprintf("subj_%.02d",1:3))

dat2 <- dat1[sample(nrow(dat1),1000,T),]
dat2$x <- rnorm(nrow(dat2))

如何根据R中的多个条件从大数据帧中提取不同长度的向量

问题描述

2 个解决方案

解决方案1
0 2015-11-24 16:20:05

解决方案2
0 已采纳 2015-11-24 16:50:11

如何根据R中的多个条件从大数据帧中提取不同长度的向量

问题描述

2 个解决方案

解决方案1 0 2015-11-24 16:20:05

解决方案2 0 已采纳 2015-11-24 16:50:11

解决方案1
0 2015-11-24 16:20:05

解决方案2
0 已采纳 2015-11-24 16:50:11