简体   繁体   English

for 循环的函数写入 3 个单独的列 R 或 dplyr/reshape 解决方案?

[英]for loop with function that writes to 3 separate columns R or dplyr/reshape solution?

I'm a total beginner to for loops so I apologize if there's already a clear answer to this question but I wasn't able to find anything that I understood how to apply to this specific question.我是一个 for 循环的初学者,所以如果这个问题已经有了明确的答案,我深表歉意,但我找不到任何我理解如何应用于这个特定问题的东西。 I also started to try a dplyr implementation at the end but couldn't figure that out either.最后我也开始尝试 dplyr 实现,但也无法弄清楚。

Here's my question: there's a function that derives 3 values from a vector.这是我的问题:有一个函数可以从向量中导出 3 个值。 I'd like to write those 3 values to the same df as new columns.我想将这 3 个值写入与新列相同的 df。 The function is timefit from the retimes library in R. If I run it on the whole df:该功能是timefitretimes在河库如果我对整个DF运行它:

  a1 <-  timefit(data$RT)
  a1:
        mu: 480.3346 
     sigma: 77.8531 
       tau: 376.7426 

If I place the values into a df df <- data.frame(a1@par) :如果我将值放入 df df <- data.frame(a1@par)

      a1.par
mu    480.33462
sigma 77.85305
tau   376.74257

I'd like to run it separately for each subID based on another variable "location" (a factor with two levels).我想根据另一个变量“位置”(一个具有两个级别的因素)为每个子 ID 单独运行它。 So that I end up with something like所以我最终得到了类似的东西

subID location mu sigma tau
1      0        500 50   400
1      0        500 50   400
1      1        376 50   410
1      1        376 50   410
2      0        400 60   400
2      0        400 60   400
2      1        410 60   410  
2      1        410 60   410

I got started with我开始了

for (subID in data) {
  timefit(data$RT)
}

But I know that's not going to actually do what I need it to do.但我知道这实际上不会做我需要它做的事情。 Values are extracted from the timefit model with @par into long format so I need to specify the function timefit to write to 3 separate column headers?值是从带有 @par 的 timefit 模型中提取为长格式的,所以我需要指定函数 timefit 以写入 3 个单独的列标题? Any suggestions?有什么建议?

Also, I thought about using ddply, but that last line is tripping me up, because the format is long but I need it to be wide.另外,我考虑过使用 ddply,但最后一行让我感到困惑,因为格式很长但我需要它很宽。 I've messed with reshape a bit, but I'm having trouble figuring it out我有点搞砸了重塑,但我无法弄清楚

data <- data %>% 
  group_by(subID, location) %>%
  mutate(timefit_out = timefit(RT))

Thanks for your help!谢谢你的帮助!

You can use summarise instead of mutate here to generate a list-column containing a data.frame from each (subID, location) 's timefit .您可以使用summarise ,而不是mutate这里生成包含列表列data.frame从每个(subID, location)timefit These data frames encode the mu , sigma , and tau from the result of timefit as columns.这些数据帧将timefit结果中的musigmatau timefit为列。 Then, use unnest to unnest this list-column to generate the result you want.然后,使用unnest嵌套此列表列以生成您想要的结果。

library(retimes)
library(dplyr)
library(tidyr)
result <- data %>% group_by(subID, location) %>%
                   summarise(timefit_out = list(data.frame(t(attr(timefit(RT),"par"))))) %>%
                   unnest()

Note that we extract the "par" attribute from the timefit class and then transpose it with t to form columns for mu , sigma , and tau .请注意,我们从timefit类中提取"par"属性,然后将其与t转置以形成musigmatau

Here, we assume that your input data is a data frame with columns subID , location , and the numeric column of reaction times RT that is input to timefit .在这里,我们假设您的输入data是一个数据框,其中包含列subIDlocation和输入到timefit的反应时间RT数字列。 A simulated example of such a dataset is given by:此类数据集的模拟示例如下:

data <- structure(list(subID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
location = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
RT = c(0.341764254728332, 0.775535081513226, 0.281827432336286, 
0.23970171622932, 0.00226009078323841, 0.385179498931393, 
0.645917195128277, 0.812101020244882, 0.183301427634433, 
0.981765420176089, 0.656369511503726, 0.824469136772677, 
0.923240559641272, 0.598261737963185, 0.309975759591907, 
0.778991278028116, 0.757012664806098, 0.869985132943839, 
0.439378245733678, 0.8420404586941, 0.643788777757436, 0.381316626211628, 
0.123881611274555, 0.540528740268201, 0.661961955949664, 
0.0592848095111549, 0.904047027230263, 0.190083365887403, 
0.963809312786907, 0.0925120878964663, 0.117538752267137, 
0.451085010776296, 0.703220259631053, 0.378451474476606, 
0.305718191433698, 0.70383172808215, 0.699415655340999, 0.740436099236831, 
0.429179352009669, 0.205358384409919)), .Names = c("subID", 
"location", "RT"), row.names = c(NA, 40L), class = "data.frame")
##   subID location          RT
##1      1        0 0.341764255
##2      1        0 0.775535082
##3      1        0 0.281827432
##4      1        0 0.239701716
##5      1        0 0.002260091
##6      1        0 0.385179499
##7      1        0 0.645917195
##8      1        0 0.812101020
##9      1        0 0.183301428
##10     1        0 0.981765420
##11     1        1 0.656369512
##12     1        1 0.824469137
##13     1        1 0.923240560
##14     1        1 0.598261738
##15     1        1 0.309975760
##16     1        1 0.778991278
##17     1        1 0.757012665
##18     1        1 0.869985133
##19     1        1 0.439378246
##20     1        1 0.842040459
##21     2        0 0.643788778
##22     2        0 0.381316626
##23     2        0 0.123881611
##24     2        0 0.540528740
##25     2        0 0.661961956
##26     2        0 0.059284810
##27     2        0 0.904047027
##28     2        0 0.190083366
##29     2        0 0.963809313
##30     2        0 0.092512088
##31     2        1 0.117538752
##32     2        1 0.451085011
##33     2        1 0.703220260
##34     2        1 0.378451474
##35     2        1 0.305718191
##36     2        1 0.703831728
##37     2        1 0.699415655
##38     2        1 0.740436099
##39     2        1 0.429179352
##40     2        1 0.205358384

The values for RT in this example are generated using runif so they are between 0 and 1 .本示例中的RT值是使用runif生成的,因此它们介于01之间。 Your values are much different, but that should not matter here.您的价值观大不相同,但这在这里无关紧要。

Using this data, we get:使用这些数据,我们得到:

print(result)
##Source: local data frame [4 x 5]
##Groups: subID [2]
##
##  subID location        mu     sigma         tau
##  <int>    <int>     <dbl>     <dbl>       <dbl>
##1     1        0 0.5275058 0.2553621 0.007086207
##2     1        1 0.2609386 0.1583494 0.085449559
##3     2        0 0.5205647 0.1994942 0.027329115
##4     2        1 0.4632886 0.2881343 0.008026460

What you are probably looking for, if you are looking for a dplyr solution is do .如果您正在寻找dplyr解决方案,您可能正在寻找的是do It allows returns of data.frames, though may require a bit of manipulation.它允许返回 data.frames,但可能需要一些操作。 Specifically, it is designed to work over groups, rather than (necessarily) rows.具体来说,它旨在处理组,而不是(必然)行。 So, you will have to set groups if you want it to return with some of the original information (and depending on the structure of your function).因此,如果您希望它返回一些原始信息(并且取决于您的函数的结构),则必须设置组。

For this, I am generating a simple data set:为此,我正在生成一个简单的数据集:

myData <-
  data.frame(
    RT = 1:4
  )

You will also need to construct a function that returns the values you want as a data.frame.您还需要构造一个函数,将您想要的值作为 data.frame 返回。 For your use, you will probably calculate the result of timefit in the function, then extract each of the values as a column to return:为了您的使用,您可能会在函数中计算timefit的结果,然后将每个值提取为一列以返回:

myFunc <- function(x){
  data.frame(a= x + 1, b = x + 2, c = x + 3)
}

Then, group by the columns you want to separate by (and return), and call do :然后,按要分隔的列(并返回)进行分组,然后调用do

myData %>%
  group_by(RT) %>%
  do((myFunc(.$RT)))

Which, in this case, returns this:在这种情况下,返回:

     RT     a     b     c
1     1     2     3     4
2     2     3     4     5
3     3     4     5     6
4     4     5     6     7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM