简体   繁体   中英

Loop through dependent variables in GLM model when an independent variable fulfill a condition

I have code to loop a logistic regression over several selected dependant variables (called outcome1-4). I would like to only run the model if a condition in an independent variable is met. Let's say I want at least two females for each outcome and type combination.

Dummy data:

set.seed(5)
df <- data.frame(
  id = c(1:100),
  age = sample(20:80, 100, replace = TRUE),
  sex = sample(c("M", "F"), 100, replace = TRUE, prob = c(0.7, 0.3)),
  type = sample(letters[1:4], 100, replace = TRUE),
  outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
  outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
  outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
  outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55)))

Code to loop GLM (cred to https://stats.idre.ucla.edu/r/codefragments/looping_strings/ ):

outcomelist <- names(df)[5:8]
modelall <- lapply(outcomelist, function(x) {
  glm(substitute(i ~ type + sex, list(i = as.name(x))), family = "binomial", data = df)})

I have found lots of questions concerning the loop but not any with additional condition. I am thinking subset but not being a pro on lapply yet I don't know where to put it.

If this is not an additional question I would like each model to be named the name of the outcome variable in the list (instead of 1 to 4) since it otherwise will be difficult to keep track of the models when the condition is added.

Appreciate any help!

One possibility is to clean the data prior to running lapply() :

df.new <- df

for(ii in 1:length(outcomelist)){

temp <- outcomelist[ii]

# check the condition for outcome variable ii
condition <- any(aggregate(df$sex=="F", by=list(df$type, df[,temp]), FUN="sum")$x < 2)

if(condition){

# if the condition is met, remove the variable from df and outcomelist  

df.new[,temp]   <- NULL
outcomelist[ii] <- NA

}

}

# lose irrelevant outcomes
outcomelist <- na.omit(outcomelist)

modelall <- lapply(outcomelist, function(x) {
  glm(substitute(i ~ type + sex, list(i = as.name(x))), family = "binomial", data = df.new)})


# name the list
names(modelall) <- outcomelist

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM