[英]loop ordinal regression statistical analysis and save the data R
could you, please, help me with a loop?你能帮我做一个循环吗? I am relatively new to R. The short version of the data looks ike this:我对 R 比较陌生。数据的简短版本如下所示:
sNumber blockNo running TrialNo wordTar wordTar1 Freq Len code code2
1 1 1 5 spouse violent 5011 6 1 2
1 1 1 5 violent spouse 17873 7 2 1
1 1 1 5 spouse aviator 5011 6 1 1
1 1 1 5 aviator wife 515 7 1 1
1 1 1 5 wife aviator 87205 4 1 1
1 1 1 5 aviator spouse 515 7 1 1
1 1 1 9 stability usually 12642 9 1 3
1 1 1 9 usually requires 60074 7 3 4
1 1 1 9 requires client 25949 8 4 1
1 1 1 9 client requires 16964 6 1 4
2 2 1 5 grimy cloth 757 5 2 1
2 2 1 5 cloth eats 8693 5 1 4
2 2 1 5 eats whitens 3494 4 4 4
2 2 1 5 whitens woman 18 7 4 1
2 2 1 5 woman penguin 162541 5 1 1
2 2 1 9 pie customer 8909 3 1 1
2 2 1 9 customer sometimes 13399 8 1 3
2 2 1 9 sometimes reimburses 96341 9 3 4
2 2 1 9 reimburses sometimes 65 10 4 3
2 2 1 9 sometimes gangster 96341 9 3 1
I have a code for ordinal regression analysis for one participant for one trial (eye-tracking data - eyeData) that looks like this:我有一个用于一项试验(眼动追踪数据 - eyeData)的一名参与者的有序回归分析代码,如下所示:
#------------set the path and import the library-----------------
setwd("/AscTask-3/Data")
library(ordinal)
#-------------read the data----------------
read.delim(file.choose(), header=TRUE) -> eyeData
#-------------extract 1 trial from one participant---------------
ss <- subset(eyeData, sNumber == 6 & runningTrialNo == 21)
#-------------delete duplicates = refixations-----------------
ss.s <- ss[!duplicated(ss$wordTar), ]
#-------------change the raw frequencies to log freq--------------
ss.s$lFreq <- log(ss.s$Freq)
#-------------add a new column with sequential numbers as a factor ------------------
ss.s$rankF <- as.factor(seq(nrow(ss.s)))
#------------ estimate an ordered logistic regression model - fit ordered logit model----------
m <- clm(rankF~lFreq*Len, data=ss.s, link='probit')
summary(m)
#---------------get confidence intervals (CI)------------------
(ci <- confint(m))
#----------odd ratios (OR)--------------
exp(coef(m))
The eyeData file is a huge massive of data consisting of 91832 observations with 11 variables. eyeData 文件是由 91832 个观测值和 11 个变量组成的海量数据。 In total there are 41 participants with 78 trials each.总共有 41 名参与者,每人有 78 次试验。 In my code I extract data from one trial from each participant to run the anaysis.在我的代码中,我从每个参与者的一次试验中提取数据来运行分析。 However, it takes a long time to run the analysis manually for all trials for all participants.但是,为所有参与者的所有试验手动运行分析需要很长时间。 Could you, please, help me to create a loop that will read in all 78 trials from all 41 participants and save the output of statistics (I want to save summary(m), ci, and coef(m) ) in one file.请您帮我创建一个循环,该循环将读取所有 41 名参与者的所有 78 次试验,并将统计数据的输出(我想将summary(m)、ci 和 coef(m ) 保存在一个文件中)。
Thank you in advance!先感谢您!
You could generate a unique identifier for every trial of every particpant.您可以为每个参与者的每次试验生成一个唯一标识符。 Then you could loop over all unique values of this identifier and subset the data accordingly.然后,您可以遍历此标识符的所有唯一值并相应地对数据进行子集化。 Then you run the regressions and save the output as a R object然后运行回归并将输出保存为 R 对象
eyeData$uniqueIdent <- paste(eyeData$sNumber, eyeData$runningTrialNo, sep = "-")
uniqueID <- unique(eyeData$uniqueIdent)
for (un in uniqueID) {
ss <- eyeData[eyeData$uniqueID == un,]
ss <- ss[!duplicated(ss$wordTar), ] #maybe do this outside the loop
ss$lFreq <- log(ss$Freq) #you could do this outside the loop too
#create DV
ss$rankF <- as.factor(seq(nrow(ss)))
m <- clm(rankF~lFreq*Len, data=ss, link='probit')
seeSumm <- summary(m)
ci <- confint(m)
oddsR <- exp(coef(m))
save(seeSumm, ci, oddsR, file = paste("toSave_", un, ".Rdata", sep = ""))
# add -un- to the output file to be able identify where it came from
}
Variations of this could include combining the output of every iteration in a list (create an empty list in the beginning) and then after running the estimations and the postestimation commands combine the elements in a list and recursively fill the previously created list "gatherRes":其变化可能包括将每次迭代的输出组合在一个列表中(在开始时创建一个空列表),然后在运行估计和 postestimation 命令后组合列表中的元素并递归填充先前创建的列表“gatherRes”:
gatherRes <- vector(mode = "list", length = length(unique(eyeData$uniqueIdent) ##before the loop
gatherRes[[un]] <- list(seeSum, ci, oddsR) ##last line inside the loop
If you're concerned with speed, you could consider writing a function that does all this and use lapply (or mclapply).如果您关心速度,您可以考虑编写一个函数来完成所有这些并使用 lapply(或 mclapply)。
Here is a solution using the plyr
package (it should be faster than a for loop).这是使用plyr
包的解决方案(它应该比 for 循环更快)。
Since you don't provide a reproducible example, I'll use the iris
data as an example.由于您没有提供可重现的示例,我将使用iris
数据作为示例。
First make a function to calculate your statistics of interest and return them as a list.首先创建一个函数来计算您感兴趣的统计数据并将它们作为列表返回。 For example:例如:
# Function to return summary, confidence intervals and coefficients from lm
lm_stats = function(x){
m = lm(Sepal.Width ~ Sepal.Length, data = x)
return(list(summary = summary(m), confint = confint(m), coef = coef(m)))
}
Then use the dlply
function, using your variables of interest as grouping然后使用dlply
函数,使用您感兴趣的变量作为分组
data(iris)
library(plyr) #if not installed do install.packages("plyr")
#Using "Species" as grouping variable
results = dlply(iris, c("Species"), lm_stats)
This will return a list of lists, containing output of summary
, confint
and coef
for each species.这将返回一个列表列表,其中包含每个物种的summary
、 confint
和coef
输出。
For your specific case, the function could look like (not tested):对于您的特定情况,该函数可能如下所示(未测试):
ordFit_stats = function(x){
#Remove duplicates
x = x[!duplicated(x$wordTar), ]
# Make log frequencies
x$lFreq <- log(x$Freq)
# Make ranks
x$rankF <- as.factor(seq(nrow(x)))
# Fit model
m <- clm(rankF~lFreq*Len, data=x, link='probit')
# Return list of statistics
return(list(summary = summary(m), confint = confint(m), coef = coef(m)))
}
And then:进而:
results = dlply(eyeData, c("sNumber", "TrialNo"), ordFit_stats)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.