简体   繁体   English

运行多行 R 代码,每次都相差一个变量,以提高可读性

[英]running multiple lines of R code that differ by a single variable each time to improve readability

I am looking to improve the readability of my code by seeing if there is a way to "loop" or "re-run" lines of code that are very similar but differ by a single variable each time.我希望通过查看是否有一种方法可以“循环”或“重新运行”非常相似但每次只有一个变量的代码行来提高代码的可读性。

My actual data analyses involves running a number of blmer calls from the blme package.我的实际数据分析涉及从blme包运行多个blmer调用。 Each of my analyses has a dependent variable, an independent variable (of which there are many), a "wave" variable (as data was collected over 3 timepoints), and unique participant id as a random effect.我的每个分析都有一个因变量、一个自变量(其中有很多)、一个“波”变量(因为数据是在 3 个时间点收集的),以及作为随机效应的唯一参与者 ID。

I'm trying to build a number of models, all of which are very similar, but each differs on what is entered as the independent variable.我正在尝试构建许多模型,所有这些模型都非常相似,但每个模型在作为自变量输入的内容上有所不同。

In the below code, I have outlined some more details, built a new, fictitious, data file, and tried to recreate models similar to those in my actual file.在下面的代码中,我概述了更多细节,构建了一个新的虚构数据文件,并尝试重新创建与我的实际文件中的模型相似的模型。

The code runs without problem on my real data and here in the fictitious data.代码在我的真实数据和虚构数据中运行没有问题。 What I'd like to draw attention to here is how even with just 3 models included (as is the case in my example below) the code begins to become long and repetitive.我想在这里提请注意的是,即使仅包含 3 个模型(如下面的示例所示),代码也开始变得冗长且重复。

##test script##
library(dplyr)
library(tidyverse)
library(blme)
#packages loaded - I'm not sure these three are exactly needed, I just loaded
#dplyr and tidyverse incase...but blme is for the Bayesian models coming later
#everything below worked on RStudio on my end but, I like I say, I don't 
#know if that is because of the above packages or not...

##build a file
DV0 <- c(100, 50, 75, 80, 20, 30) #let's say performance on a soccer task at time 1 - max 100
DV1 <- c(100, 60, 80, 80, 25, 40) #performance on soccer task at time 2
DV2 <- c(95, 55, 70, 70, 20, 35) #performance on soccer task at time 3
IV1.0 <- c(90, 60, 65, 75, 40, 50) #score on cognitive task A at time 1 - max 100
IV1.1 <- c(95, 70, 75, 80, 50, 70) #score on cog task A at time 2 
IV1.2 <- c(90, 55, 60, 70, 45, 60) #score on cog task A at time 3
IV2.0 <- c(10, 40, 50, 60, 20, 25) #score on cognitive task B at time 1 - max 100
IV2.1 <- c(20, 50, 60, 75, 35, 35) #score on cog task B at time 2
IV2.2 <- c(15, 40, 40, 55, 25, 25) #score on cos task B at time 3
id <- c("Jon", "Sara", "Lisa", "Tim", "Joe", "Paul")

##create a data frame before pivot to a better format for longitudinal data
df <- data.frame(DV0, DV1, DV2, IV1.0, IV1.1, IV1.2, IV2.0, IV2.1, IV2.2,
                 id)
df.long <- long_panel(df, begin = 0, end = 2, label_location = "end")

#now onto the main analyses 
#here I want to use "blmer" from "blme" package to understand how performance
#on the soccer task first is affected by time alone (model1 below). 
#Next,I want to check whether adding performance on cognitive task A
#influences performance (model2 below), before running the same analyses but with
#cognitive task B (model3 below) - in this example I have just two cognitive 
#tasks, but in my real work I have many more IVs to test (let's in this case 
#just say it would be more cognitive tasks). Final thing I plan to add an 
#individual slope and intercept based on the id variable

#time alone and soccer task performance
model1 <- blmer(DV ~ wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model1)

#new experimental model with cognitive tasks A performance added
model2 <- blmer(DV ~ IV1. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model2)
anova(model1, model2)

#a similar experimental model with cogntive tasks B performance instead of A
model3 <- blmer(DV ~ IV2. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model3)
anova(model1, model3)

#in the real data I then have many more models with IV1. or IV2. changed for 
#another independent variable (e.g., IV3. or IV4.) and as a result the code
#is very long. I'm wanting to know, can the above be put together in fewer 
#lines of code. What I've been reading is maybe that I could loop somewhere
#so that "IV.*" is replaced each time?

#thanks in advance for any help!

So, if you have any ways to essentially run the code for model1, model2, and model3 in this example if fewer lines of code, that would be great.因此,如果您有任何方法可以基本上运行此示例中的模型 1、模型 2 和模型 3 的代码(如果代码行数较少),那就太好了。

You can create a function that receives the independent variable as a string, plus the df, and other options, and leverages as.formula() .您可以创建一个函数,以字符串形式接收自变量,加上 df 和其他选项,并利用as.formula() Then apply the function to each of the your independent variables using lapply() .然后使用lapply()将该函数应用于每个自变量。 You can use "" as the "independent variable", when running the wave-only model (ie model 1).当运行仅波模型(即模型 1)时,您可以使用""作为“自变量”。

get_model <- function(ind_var, df, REML = FALSE,fixef.prior = "normal",...) {
  f <- as.formula(paste0("DV ~ ",ind_var, " + wave + (1 | id)"))
  blmer(f, data = df, REML = REML,fixef.prior = fixef.prior,...)
}

Now get a list called models现在得到一个名为models的列表

models = lapply(c("", "IV1.", "IV2."), get_model, df=df.long)

You can run any anova you like, like this:您可以运行任何您喜欢的 anova,如下所示:

anova(models[[1]], models[[3]])

Output:输出:

Data: df
Models:
models[[1]]: DV ~ +wave + (1 | id)
models[[3]]: DV ~ IV2. + wave + (1 | id)
            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
models[[1]]    4 141.41 144.98 -66.707   133.41                        
models[[3]]    5 133.12 137.57 -61.560   123.12 10.296  1   0.001333 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

There is another option, which is to make df.long even "longer", and then estimate the models by the grouping variable.还有一种选择,就是让df.long甚至“更长”,然后通过分组变量来估计模型。 Here is an example of doing that with data.table这是使用data.table执行此操作的示例

library(data.table)
setDT(df.long)

df.longer=melt(df.long, measure=c("IV1.", "IV2."),variable.name = "ind_var")

rbind(
  df.long[, .(model=list(blmer(DV~wave+(1|id), REML=F, fixef.prior="normal")))][, ind_var:="None"],
  df.longer[, .(model=list(blmer(DV~value+wave+(1|id), REML=F, fixef.prior="normal"))), ind_var]
)

Output is a data.table of models输出是模型的数据表

            model ind_var
           <list>  <fctr>
1: <blmerMod[14]>    None
2: <blmerMod[14]>    IV1.
3: <blmerMod[14]>    IV2.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM