简体   繁体   English

从data.frame中各列的每一行中随机选择值,然后在R中取平均值

[英]randomly select values from each row across columns in a data.frame and average them in R

This question is similar to a previous one I made here: randomly sum values from rows and assign them to 2 columns in R 这个问题类似于我在这里提出的问题: 将行中的值随机求和并将它们分配给R中的2列

Since I'm having difficulties with R, this question is both about programming and statistics. 由于我在使用R时遇到困难,所以这个问题既涉及编程又涉及统计。 I'm very new to both. 我俩都很新。

I have a data.frame with 219 subjects in one column. 我在一列中有一个219个主题的data.frame。 The rest of the columns are 7, and in each row I have a number which represents a difference in response time for that particular subject when exposed to the two conditions of the experiment. 其余各列为7,在每一行中,我都有一个数字,表示该特定对象在实验的两种条件下的反应时间差异。

This is how the data looks (I'm using the head function, otherwise it would be too long): 这是数据的外观(我正在使用head函数,否则会太长):

    > head(RTsdiff)
      subject   block3diff   block4diff   block5diff   block6diff   block7diff
    1   40002  0.076961798  0.046067460 -0.027012048  0.017920261  0.002660317
    2   40004  0.037558511 -0.016535211 -0.044306743 -0.011541667  0.044422892
    3   40006 -0.017063123 -0.031156150 -0.084003876 -0.070227149 -0.113382784
    4   40008 -0.015204017 -0.009954545 -0.004082353  0.006327839  0.022335271
    5   40009  0.006055829 -0.045376437 -0.002725572  0.016443182  0.032848128
    6   40010 -0.003017857 -0.034398268 -0.034476491  0.014158824 -0.036592982
       block8diff    block9dif
    1  0.03652273  0.037306173
    2 -0.08032784 -0.150682051
    3 -0.09724864 -0.060338684
    4 -0.04783333  0.006539326 
    5 -0.01459465 -0.067916667
    6 -0.01868126 -0.034409584

What I need is a code that will, for every subject (ie every row) will sample either 3 or 4 values, average them, and add them to a new vector (called half1). 我需要的是一个代码,它将对每个主题(即每一行)采样3个或4个值,取它们的平均值,然后将它们添加到一个新的矢量(称为half1)中。 The vector half2 should have the average of the values that were not sampled in the first try. 向量half2应该具有第一次尝试中未采样的值的平均值。

So, supposing the data.frame I want t create was called "RTshalves", I would need the first column to be the same column of subjects in RTsdiff, the second column must have in the first row the average of the randomly selected values that correspond to the first subject, and the second column must have the average of the values of the first subject that were not chosen in the first sampling. 因此,假设我要创建的data.frame称为“ RTshalves”,则我需要第一列与RTsdiff中主题的同一列,第二列必须在第一行中包含随机选择值的平均值,即对应于第一个主题,第二列必须具有第一个采样中未选择的第一个主题的平均值。 The second row of columns 2 and 3 should have the same information, but this time for subject 2 (that is subject 40004 in my data.frame), etc, until reaching the 219 subjects. 第2列和第3列的第二行应该具有相同的信息,但是这次是主题2(即我的data.frame中的主题40004),依此类推,直到达到219个主题为止。

Let's suppose that the first sample randomly selected 3 values of subject 1 (block3diff, block5diff and block9diff) and thus the values of block4diff, block6diff, block7diff and block8diff would automatically correspond to the other half. 假设第一个样本随机选择了主题1的3个值(block3diff,block5diff和block9diff),因此block4diff,block6diff,block7diff和block8diff的值将自动对应于另一半。 Then, what I would expect to see (considering only the first of the 219 rows) is: 然后,我希望看到的(仅考虑219行中的第一行)是:

   Subject     Half1       Half2 
    40002   0.02908531   0.02579269

If anyone is interested in the statistics behind this, I'm trying to do a split-half reliability test to check for the consistency of a test. 如果有人对此背后的统计数据感兴趣,那么我将尝试进行半可靠性测试,以检查测试的一致性。 The rationale is that if the difference in RT average is a reliable estimator of the effect, then the differences of half of the blocks of one participant should be correlated to the differences of the other half of the blocks. 理由是,如果RT平均值的差异是影响的可靠估计,则一个参与者的一半区块的差异应与另一半区块的差异相关。

Help is much appreciated. 非常感谢您的帮助。 Thanks in advance. 提前致谢。

half1 is easy: write your own function to do what you want to each row (taken in as a vector), then apply it to the rows: half1很简单:编写自己的函数对每一行(作为向量)进行所需的操作,然后apply其应用于行:

eachrow <- function(x) {
   mean(sample(x,2))
}
RTsdiff$half1 <- apply(eachrow,1,RTsdiff)

To get half2, you'll probably want to do it at the same time. 要获得Half2,您可能需要同时进行。 ddply might be easiest for this (let the by argument be your subject variable to get each row). ddply可能最简单(让by参数成为您的主题变量以获取每一行)。 Like this: 像这样:

RTsdiff <- data.frame(subject=seq(6))
RTsdiff <- cbind( RTsdiff, matrix(runif(6*8),ncol=8) )

library(plyr)
eachrow <- function(x,n=3) {
  x <- as.numeric(x[,2:ncol(x)]) # eliminate the ID column to make things easier, make a vector
  s <- seq(length(x))
  ones <- sample(s,n) # get ids for half1
  twos <- !(s %in% ones) # get ids for half2
  data.frame( half1=mean(x[ones]), half2=mean(x[twos]) )
}
ddply( RTsdiff, .(subject), eachrow)

  subject     half1     half2
1       1 0.4700982 0.5350610
2       2 0.6173469 0.5351995
3       3 0.2245246 0.6807482
4       4 0.6330649 0.6316353
5       5 0.6388060 0.6629077
6       6 0.4652086 0.5073034

There are plenty of more elegant ways of doing this. 有很多更优雅的方法可以做到这一点。 In particular, I used ddply for its ability to easily output data.frames so that I could output both half1 and half2 from the function and have them combined up nicely at the end, but ddply takes data.frames as input, so there's some slight machination to get it out to a vector first. 特别是,我使用ddply来轻松输出data.frames,以便可以从函数中输出half1half2并在最后将它们很好地组合在一起,但是ddply将data.frames作为输入,因此有些首先要把它弄成向量。 Feeding sapply a transposed data.frame would possibly be simpler. sapply提供转置的data.frame可能会更简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据跨列的行值将data.frame拆分为列表 - split data.frame into list based on row values across columns R,用另一个data.frame +动态列中的值替换data.frame中的值 - R, replace values in a data.frame by values from another data.frame + dynamic columns R-根据一列中跨不同列的公共值,将data.frame格式化为另一个“组合” data.frame - R- format a data.frame into another 'combined' data.frame based on common values within a column dependent across different columns 随机 select 来自 R 中数据帧的非 NA 行的值 - Randomly select a value from a row that is not NA from a data frame in R Vectorise R代码从每行中随机选择2列 - Vectorise R code to randomly select 2 columns from each row 在 R 中的另一个 data.frame 中按权重乘以每列的值 - Multiply values across each column by weight in another data.frame in R 用常数分隔列并将它们压缩为 R data.frame 中的一行 - Separate columns with constant numbers and condense them to one row in R data.frame 后续:用常数分隔列,并在R data.frame中压缩为一行 - Follow-up: Separate columns with constant numbers and condense them to one row in R data.frame R编程-比较data.frame的每一行与另一个data.frame的每一行的值 - R Programming - Comparing a value form each row of a data.frame with each row of another data.frame 在data.frame列中查找第一个值[R] - Find first value across data.frame columns [R]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM