简体   繁体   English

循环序数回归统计分析并保存数据R

[英]loop ordinal regression statistical analysis and save the data R

could you, please, help me with a loop?你能帮我做一个循环吗? I am relatively new to R. The short version of the data looks ike this:我对 R 比较陌生。数据的简短版本如下所示:

sNumber  blockNo running TrialNo    wordTar   wordTar1   Freq Len code code2
1        1       1       5           spouse    violent   5011   6    1     2
1        1       1       5          violent     spouse  17873   7    2     1
1        1       1       5           spouse    aviator   5011   6    1     1
1        1       1       5          aviator       wife    515   7    1     1
1        1       1       5             wife    aviator  87205   4    1     1
1        1       1       5          aviator     spouse    515   7    1     1
1        1       1       9        stability    usually  12642   9    1     3
1        1       1       9          usually   requires  60074   7    3     4
1        1       1       9         requires     client  25949   8    4     1
1        1       1       9           client   requires  16964   6    1     4
2        2       1       5            grimy      cloth    757   5    2     1
2        2       1       5            cloth       eats   8693   5    1     4
2        2       1       5             eats    whitens   3494   4    4     4
2        2       1       5          whitens      woman     18   7    4     1
2        2       1       5            woman    penguin 162541   5    1     1
2        2       1       9              pie   customer   8909   3    1     1
2        2       1       9         customer  sometimes  13399   8    1     3
2        2       1       9        sometimes reimburses  96341   9    3     4
2        2       1       9       reimburses  sometimes     65  10    4     3
2        2       1       9        sometimes   gangster  96341   9    3     1

I have a code for ordinal regression analysis for one participant for one trial (eye-tracking data - eyeData) that looks like this:我有一个用于一项试验(眼动追踪数据 - eyeData)的一名参与者的有序回归分析代码,如下所示:

#------------set the path and import the library-----------------
setwd("/AscTask-3/Data")
library(ordinal)

#-------------read the data----------------
read.delim(file.choose(), header=TRUE) -> eyeData

#-------------extract 1 trial from one participant---------------
ss <- subset(eyeData, sNumber == 6 & runningTrialNo == 21)

#-------------delete duplicates = refixations-----------------
ss.s <- ss[!duplicated(ss$wordTar), ] 

#-------------change the raw frequencies to log freq--------------
ss.s$lFreq <- log(ss.s$Freq)

#-------------add a new column with sequential numbers as a factor ------------------
ss.s$rankF <- as.factor(seq(nrow(ss.s))) 

#------------ estimate an ordered logistic regression model - fit ordered logit model----------
m <- clm(rankF~lFreq*Len, data=ss.s, link='probit')
summary(m)

#---------------get confidence intervals (CI)------------------
(ci <- confint(m)) 

#----------odd ratios (OR)--------------
exp(coef(m))

The eyeData file is a huge massive of data consisting of 91832 observations with 11 variables. eyeData 文件是由 91832 个观测值和 11 个变量组成的海量数据。 In total there are 41 participants with 78 trials each.总共有 41 名参与者,每人有 78 次试验。 In my code I extract data from one trial from each participant to run the anaysis.在我的代码中,我从每个参与者的一次试验中提取数据来运行分析。 However, it takes a long time to run the analysis manually for all trials for all participants.但是,为所有参与者的所有试验手动运行分析需要很长时间。 Could you, please, help me to create a loop that will read in all 78 trials from all 41 participants and save the output of statistics (I want to save summary(m), ci, and coef(m) ) in one file.请您帮我创建一个循环,该循环将读取所有 41 名参与者的所有 78 次试验,并将统计数据的输出(我想将summary(m)、ci 和 coef(m ) 保存在一个文件中)。

Thank you in advance!先感谢您!

You could generate a unique identifier for every trial of every particpant.您可以为每个参与者的每次试验生成一个唯一标识符。 Then you could loop over all unique values of this identifier and subset the data accordingly.然后,您可以遍历此标识符的所有唯一值并相应地对数据进行子集化。 Then you run the regressions and save the output as a R object然后运行回归并将输出保存为 R 对象

eyeData$uniqueIdent <- paste(eyeData$sNumber, eyeData$runningTrialNo, sep = "-")
uniqueID <- unique(eyeData$uniqueIdent)
for (un in uniqueID) {
   ss <- eyeData[eyeData$uniqueID == un,]
   ss <- ss[!duplicated(ss$wordTar), ] #maybe do this outside the loop
   ss$lFreq <- log(ss$Freq)  #you could do this outside the loop too
   #create DV
   ss$rankF <- as.factor(seq(nrow(ss)))
   m <- clm(rankF~lFreq*Len, data=ss, link='probit')
   seeSumm <- summary(m)
   ci <- confint(m) 
   oddsR <- exp(coef(m))
   save(seeSumm, ci, oddsR, file = paste("toSave_", un, ".Rdata", sep = ""))
   # add -un- to the output file to be able identify where it came from
}

Variations of this could include combining the output of every iteration in a list (create an empty list in the beginning) and then after running the estimations and the postestimation commands combine the elements in a list and recursively fill the previously created list "gatherRes":其变化可能包括将每次迭代的输出组合在一个列表中(在开始时创建一个空列表),然后在运行估计和 postestimation 命令后组合列表中的元素并递归填充先前创建的列表“gatherRes”:

gatherRes <- vector(mode = "list", length = length(unique(eyeData$uniqueIdent)  ##before the loop
gatherRes[[un]] <- list(seeSum, ci, oddsR)  ##last line inside the loop

If you're concerned with speed, you could consider writing a function that does all this and use lapply (or mclapply).如果您关心速度,您可以考虑编写一个函数来完成所有这些并使用 lapply(或 mclapply)。

Here is a solution using the plyr package (it should be faster than a for loop).这是使用plyr包的解决方案(它应该比 for 循环更快)。

Since you don't provide a reproducible example, I'll use the iris data as an example.由于您没有提供可重现的示例,我将使用iris数据作为示例。

First make a function to calculate your statistics of interest and return them as a list.首先创建一个函数来计算您感兴趣的统计数据并将它们作为列表返回。 For example:例如:

# Function to return summary, confidence intervals and coefficients from lm
lm_stats = function(x){
  m = lm(Sepal.Width ~ Sepal.Length, data = x)

  return(list(summary = summary(m), confint = confint(m), coef = coef(m)))
}

Then use the dlply function, using your variables of interest as grouping然后使用dlply函数,使用您感兴趣的变量作为分组

data(iris)
library(plyr) #if not installed do install.packages("plyr")

#Using "Species" as grouping variable
results = dlply(iris, c("Species"), lm_stats)

This will return a list of lists, containing output of summary , confint and coef for each species.这将返回一个列表列表,其中包含每个物种的summaryconfintcoef输出。

For your specific case, the function could look like (not tested):对于您的特定情况,该函数可能如下所示(未测试):

ordFit_stats = function(x){

  #Remove duplicates
  x = x[!duplicated(x$wordTar), ]

  # Make log frequencies
  x$lFreq <- log(x$Freq)

  # Make ranks
  x$rankF <- as.factor(seq(nrow(x)))

  # Fit model
  m <- clm(rankF~lFreq*Len, data=x, link='probit')

  # Return list of statistics
  return(list(summary = summary(m), confint = confint(m), coef = coef(m)))
}

And then:进而:

results = dlply(eyeData, c("sNumber", "TrialNo"), ordFit_stats)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM