[英]for loop with function that writes to 3 separate columns R or dplyr/reshape solution?
I'm a total beginner to for loops so I apologize if there's already a clear answer to this question but I wasn't able to find anything that I understood how to apply to this specific question.我是一个 for 循环的初学者,所以如果这个问题已经有了明确的答案,我深表歉意,但我找不到任何我理解如何应用于这个特定问题的东西。 I also started to try a dplyr implementation at the end but couldn't figure that out either.
最后我也开始尝试 dplyr 实现,但也无法弄清楚。
Here's my question: there's a function that derives 3 values from a vector.这是我的问题:有一个函数可以从向量中导出 3 个值。 I'd like to write those 3 values to the same df as new columns.
我想将这 3 个值写入与新列相同的 df。 The function is
timefit
from the retimes
library in R. If I run it on the whole df:该功能是
timefit
从retimes
在河库如果我对整个DF运行它:
a1 <- timefit(data$RT)
a1:
mu: 480.3346
sigma: 77.8531
tau: 376.7426
If I place the values into a df df <- data.frame(a1@par)
:如果我将值放入 df
df <- data.frame(a1@par)
:
a1.par
mu 480.33462
sigma 77.85305
tau 376.74257
I'd like to run it separately for each subID based on another variable "location" (a factor with two levels).我想根据另一个变量“位置”(一个具有两个级别的因素)为每个子 ID 单独运行它。 So that I end up with something like
所以我最终得到了类似的东西
subID location mu sigma tau
1 0 500 50 400
1 0 500 50 400
1 1 376 50 410
1 1 376 50 410
2 0 400 60 400
2 0 400 60 400
2 1 410 60 410
2 1 410 60 410
I got started with我开始了
for (subID in data) {
timefit(data$RT)
}
But I know that's not going to actually do what I need it to do.但我知道这实际上不会做我需要它做的事情。 Values are extracted from the timefit model with @par into long format so I need to specify the function timefit to write to 3 separate column headers?
值是从带有 @par 的 timefit 模型中提取为长格式的,所以我需要指定函数 timefit 以写入 3 个单独的列标题? Any suggestions?
有什么建议?
Also, I thought about using ddply, but that last line is tripping me up, because the format is long but I need it to be wide.另外,我考虑过使用 ddply,但最后一行让我感到困惑,因为格式很长但我需要它很宽。 I've messed with reshape a bit, but I'm having trouble figuring it out
我有点搞砸了重塑,但我无法弄清楚
data <- data %>%
group_by(subID, location) %>%
mutate(timefit_out = timefit(RT))
Thanks for your help!谢谢你的帮助!
You can use summarise
instead of mutate
here to generate a list-column containing a data.frame
from each (subID, location)
's timefit
.您可以使用
summarise
,而不是mutate
这里生成包含列表列data.frame
从每个(subID, location)
的timefit
。 These data frames encode the mu
, sigma
, and tau
from the result of timefit
as columns.这些数据帧将
timefit
结果中的mu
、 sigma
和tau
timefit
为列。 Then, use unnest
to unnest this list-column to generate the result you want.然后,使用
unnest
嵌套此列表列以生成您想要的结果。
library(retimes)
library(dplyr)
library(tidyr)
result <- data %>% group_by(subID, location) %>%
summarise(timefit_out = list(data.frame(t(attr(timefit(RT),"par"))))) %>%
unnest()
Note that we extract the "par"
attribute from the timefit
class and then transpose it with t
to form columns for mu
, sigma
, and tau
.请注意,我们从
timefit
类中提取"par"
属性,然后将其与t
转置以形成mu
、 sigma
和tau
。
Here, we assume that your input data
is a data frame with columns subID
, location
, and the numeric column of reaction times RT
that is input to timefit
.在这里,我们假设您的输入
data
是一个数据框,其中包含列subID
、 location
和输入到timefit
的反应时间RT
数字列。 A simulated example of such a dataset is given by:此类数据集的模拟示例如下:
data <- structure(list(subID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
location = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
RT = c(0.341764254728332, 0.775535081513226, 0.281827432336286,
0.23970171622932, 0.00226009078323841, 0.385179498931393,
0.645917195128277, 0.812101020244882, 0.183301427634433,
0.981765420176089, 0.656369511503726, 0.824469136772677,
0.923240559641272, 0.598261737963185, 0.309975759591907,
0.778991278028116, 0.757012664806098, 0.869985132943839,
0.439378245733678, 0.8420404586941, 0.643788777757436, 0.381316626211628,
0.123881611274555, 0.540528740268201, 0.661961955949664,
0.0592848095111549, 0.904047027230263, 0.190083365887403,
0.963809312786907, 0.0925120878964663, 0.117538752267137,
0.451085010776296, 0.703220259631053, 0.378451474476606,
0.305718191433698, 0.70383172808215, 0.699415655340999, 0.740436099236831,
0.429179352009669, 0.205358384409919)), .Names = c("subID",
"location", "RT"), row.names = c(NA, 40L), class = "data.frame")
## subID location RT
##1 1 0 0.341764255
##2 1 0 0.775535082
##3 1 0 0.281827432
##4 1 0 0.239701716
##5 1 0 0.002260091
##6 1 0 0.385179499
##7 1 0 0.645917195
##8 1 0 0.812101020
##9 1 0 0.183301428
##10 1 0 0.981765420
##11 1 1 0.656369512
##12 1 1 0.824469137
##13 1 1 0.923240560
##14 1 1 0.598261738
##15 1 1 0.309975760
##16 1 1 0.778991278
##17 1 1 0.757012665
##18 1 1 0.869985133
##19 1 1 0.439378246
##20 1 1 0.842040459
##21 2 0 0.643788778
##22 2 0 0.381316626
##23 2 0 0.123881611
##24 2 0 0.540528740
##25 2 0 0.661961956
##26 2 0 0.059284810
##27 2 0 0.904047027
##28 2 0 0.190083366
##29 2 0 0.963809313
##30 2 0 0.092512088
##31 2 1 0.117538752
##32 2 1 0.451085011
##33 2 1 0.703220260
##34 2 1 0.378451474
##35 2 1 0.305718191
##36 2 1 0.703831728
##37 2 1 0.699415655
##38 2 1 0.740436099
##39 2 1 0.429179352
##40 2 1 0.205358384
The values for RT
in this example are generated using runif
so they are between 0
and 1
.本示例中的
RT
值是使用runif
生成的,因此它们介于0
和1
之间。 Your values are much different, but that should not matter here.您的价值观大不相同,但这在这里无关紧要。
Using this data, we get:使用这些数据,我们得到:
print(result)
##Source: local data frame [4 x 5]
##Groups: subID [2]
##
## subID location mu sigma tau
## <int> <int> <dbl> <dbl> <dbl>
##1 1 0 0.5275058 0.2553621 0.007086207
##2 1 1 0.2609386 0.1583494 0.085449559
##3 2 0 0.5205647 0.1994942 0.027329115
##4 2 1 0.4632886 0.2881343 0.008026460
What you are probably looking for, if you are looking for a dplyr
solution is do
.如果您正在寻找
dplyr
解决方案,您可能正在寻找的是do
。 It allows returns of data.frames, though may require a bit of manipulation.它允许返回 data.frames,但可能需要一些操作。 Specifically, it is designed to work over groups, rather than (necessarily) rows.
具体来说,它旨在处理组,而不是(必然)行。 So, you will have to set groups if you want it to return with some of the original information (and depending on the structure of your function).
因此,如果您希望它返回一些原始信息(并且取决于您的函数的结构),则必须设置组。
For this, I am generating a simple data set:为此,我正在生成一个简单的数据集:
myData <-
data.frame(
RT = 1:4
)
You will also need to construct a function that returns the values you want as a data.frame.您还需要构造一个函数,将您想要的值作为 data.frame 返回。 For your use, you will probably calculate the result of
timefit
in the function, then extract each of the values as a column to return:为了您的使用,您可能会在函数中计算
timefit
的结果,然后将每个值提取为一列以返回:
myFunc <- function(x){
data.frame(a= x + 1, b = x + 2, c = x + 3)
}
Then, group by the columns you want to separate by (and return), and call do
:然后,按要分隔的列(并返回)进行分组,然后调用
do
:
myData %>%
group_by(RT) %>%
do((myFunc(.$RT)))
Which, in this case, returns this:在这种情况下,返回:
RT a b c
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.