循环序数回归统计分析并保存数据R

Question

你能帮我做一个循环吗？ 我对 R 比较陌生。数据的简短版本如下所示：

sNumber  blockNo running TrialNo    wordTar   wordTar1   Freq Len code code2
1        1       1       5           spouse    violent   5011   6    1     2
1        1       1       5          violent     spouse  17873   7    2     1
1        1       1       5           spouse    aviator   5011   6    1     1
1        1       1       5          aviator       wife    515   7    1     1
1        1       1       5             wife    aviator  87205   4    1     1
1        1       1       5          aviator     spouse    515   7    1     1
1        1       1       9        stability    usually  12642   9    1     3
1        1       1       9          usually   requires  60074   7    3     4
1        1       1       9         requires     client  25949   8    4     1
1        1       1       9           client   requires  16964   6    1     4
2        2       1       5            grimy      cloth    757   5    2     1
2        2       1       5            cloth       eats   8693   5    1     4
2        2       1       5             eats    whitens   3494   4    4     4
2        2       1       5          whitens      woman     18   7    4     1
2        2       1       5            woman    penguin 162541   5    1     1
2        2       1       9              pie   customer   8909   3    1     1
2        2       1       9         customer  sometimes  13399   8    1     3
2        2       1       9        sometimes reimburses  96341   9    3     4
2        2       1       9       reimburses  sometimes     65  10    4     3
2        2       1       9        sometimes   gangster  96341   9    3     1

我有一个用于一项试验（眼动追踪数据 - eyeData）的一名参与者的有序回归分析代码，如下所示：

#------------set the path and import the library-----------------
setwd("/AscTask-3/Data")
library(ordinal)

#-------------read the data----------------
read.delim(file.choose(), header=TRUE) -> eyeData

#-------------extract 1 trial from one participant---------------
ss <- subset(eyeData, sNumber == 6 & runningTrialNo == 21)

#-------------delete duplicates = refixations-----------------
ss.s <- ss[!duplicated(ss$wordTar), ] 

#-------------change the raw frequencies to log freq--------------
ss.s$lFreq <- log(ss.s$Freq)

#-------------add a new column with sequential numbers as a factor ------------------
ss.s$rankF <- as.factor(seq(nrow(ss.s))) 

#------------ estimate an ordered logistic regression model - fit ordered logit model----------
m <- clm(rankF~lFreq*Len, data=ss.s, link='probit')
summary(m)

#---------------get confidence intervals (CI)------------------
(ci <- confint(m)) 

#----------odd ratios (OR)--------------
exp(coef(m))

eyeData 文件是由 91832 个观测值和 11 个变量组成的海量数据。 总共有 41 名参与者，每人有 78 次试验。 在我的代码中，我从每个参与者的一次试验中提取数据来运行分析。 但是，为所有参与者的所有试验手动运行分析需要很长时间。 请您帮我创建一个循环，该循环将读取所有 41 名参与者的所有 78 次试验，并将统计数据的输出（我想将summary(m)、ci 和 coef(m ) 保存在一个文件中）。

先感谢您！

Answer 1

您可以为每个参与者的每次试验生成一个唯一标识符。 然后，您可以遍历此标识符的所有唯一值并相应地对数据进行子集化。 然后运行回归并将输出保存为 R 对象

eyeData$uniqueIdent <- paste(eyeData$sNumber, eyeData$runningTrialNo, sep = "-")
uniqueID <- unique(eyeData$uniqueIdent)
for (un in uniqueID) {
   ss <- eyeData[eyeData$uniqueID == un,]
   ss <- ss[!duplicated(ss$wordTar), ] #maybe do this outside the loop
   ss$lFreq <- log(ss$Freq)  #you could do this outside the loop too
   #create DV
   ss$rankF <- as.factor(seq(nrow(ss)))
   m <- clm(rankF~lFreq*Len, data=ss, link='probit')
   seeSumm <- summary(m)
   ci <- confint(m) 
   oddsR <- exp(coef(m))
   save(seeSumm, ci, oddsR, file = paste("toSave_", un, ".Rdata", sep = ""))
   # add -un- to the output file to be able identify where it came from
}

其变化可能包括将每次迭代的输出组合在一个列表中（在开始时创建一个空列表），然后在运行估计和 postestimation 命令后组合列表中的元素并递归填充先前创建的列表“gatherRes”：

gatherRes <- vector(mode = "list", length = length(unique(eyeData$uniqueIdent)  ##before the loop
gatherRes[[un]] <- list(seeSum, ci, oddsR)  ##last line inside the loop

如果您关心速度，您可以考虑编写一个函数来完成所有这些并使用 lapply（或 mclapply）。

Answer 2

这是使用plyr包的解决方案（它应该比 for 循环更快）。

由于您没有提供可重现的示例，我将使用iris数据作为示例。

首先创建一个函数来计算您感兴趣的统计数据并将它们作为列表返回。 例如：

# Function to return summary, confidence intervals and coefficients from lm
lm_stats = function(x){
  m = lm(Sepal.Width ~ Sepal.Length, data = x)

  return(list(summary = summary(m), confint = confint(m), coef = coef(m)))
}

然后使用dlply函数，使用您感兴趣的变量作为分组

data(iris)
library(plyr) #if not installed do install.packages("plyr")

#Using "Species" as grouping variable
results = dlply(iris, c("Species"), lm_stats)

这将返回一个列表列表，其中包含每个物种的summary 、 confint和coef输出。

对于您的特定情况，该函数可能如下所示（未测试）：

ordFit_stats = function(x){

  #Remove duplicates
  x = x[!duplicated(x$wordTar), ]

  # Make log frequencies
  x$lFreq <- log(x$Freq)

  # Make ranks
  x$rankF <- as.factor(seq(nrow(x)))

  # Fit model
  m <- clm(rankF~lFreq*Len, data=x, link='probit')

  # Return list of statistics
  return(list(summary = summary(m), confint = confint(m), coef = coef(m)))
}

进而：

results = dlply(eyeData, c("sNumber", "TrialNo"), ordFit_stats)

循环序数回归统计分析并保存数据R

问题描述

2 个解决方案

解决方案1
0 2015-06-30 15:40:13

解决方案2
0 2015-06-30 16:28:30

循环序数回归统计分析并保存数据R

问题描述

2 个解决方案

解决方案1 0 2015-06-30 15:40:13

解决方案2 0 2015-06-30 16:28:30

解决方案1
0 2015-06-30 15:40:13

解决方案2
0 2015-06-30 16:28:30