Loop through dependent variables in GLM model when an independent variable fulfill a condition

Question

I have code to loop a logistic regression over several selected dependant variables (called outcome1-4). I would like to only run the model if a condition in an independent variable is met. Let's say I want at least two females for each outcome and type combination.

Dummy data:

set.seed(5)
df <- data.frame(
  id = c(1:100),
  age = sample(20:80, 100, replace = TRUE),
  sex = sample(c("M", "F"), 100, replace = TRUE, prob = c(0.7, 0.3)),
  type = sample(letters[1:4], 100, replace = TRUE),
  outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
  outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
  outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
  outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55)))

Code to loop GLM (cred to https://stats.idre.ucla.edu/r/codefragments/looping_strings/ ):

outcomelist <- names(df)[5:8]
modelall <- lapply(outcomelist, function(x) {
  glm(substitute(i ~ type + sex, list(i = as.name(x))), family = "binomial", data = df)})

I have found lots of questions concerning the loop but not any with additional condition. I am thinking subset but not being a pro on lapply yet I don't know where to put it.

If this is not an additional question I would like each model to be named the name of the outcome variable in the list (instead of 1 to 4) since it otherwise will be difficult to keep track of the models when the condition is added.

Appreciate any help!

Answer 1

One possibility is to clean the data prior to running lapply() :

df.new <- df

for(ii in 1:length(outcomelist)){

temp <- outcomelist[ii]

# check the condition for outcome variable ii
condition <- any(aggregate(df$sex=="F", by=list(df$type, df[,temp]), FUN="sum")$x < 2)

if(condition){

# if the condition is met, remove the variable from df and outcomelist  

df.new[,temp]   <- NULL
outcomelist[ii] <- NA

}

}

# lose irrelevant outcomes
outcomelist <- na.omit(outcomelist)

modelall <- lapply(outcomelist, function(x) {
  glm(substitute(i ~ type + sex, list(i = as.name(x))), family = "binomial", data = df.new)})


# name the list
names(modelall) <- outcomelist

Loop through dependent variables in GLM model when an independent variable fulfill a condition

Question

1 answers

solution1
0 2020-05-09 10:25:51

Loop through dependent variables in GLM model when an independent variable fulfill a condition

Question

1 answers

solution1 0 2020-05-09 10:25:51

solution1
0 2020-05-09 10:25:51