简体   繁体   中英

Stratification to adjust for confounding with R

I have data with 4 exposures (or attributes) (each is binary, eg high/low, true/false, red/blue) and 1 disease outcome (true/false had disease).

I want to calculate relative risk of each exposure causing the disease outcome, while controlling for confounding.

I prefer to use stratification, but with 4 exposures that's a lot of strata. But I would be open to multivariate analysis if there is a easy way to do this. By stratification I mean as it's described here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5384727/

Is there a software tool that can help me input a table with 5 columns (4 exposures, 1 disease outcome) and generate relative risk values (with 95% confidence intervals) by strata?

The data structure is (first 3 columns are exposures, last column is outcome). These are just sample exposures to illustrate what I mean, not my actual exposures and outcomes:

| had breakfast | exercised | slept over 7h | is happy |

| true | false | true | false |

| false | true | true | true |

| false | true | false | false |

I can't help you with stratification, but doing multiple logistic regression is pretty simple in R.

First some example data.
350 samples of three binary explanatory variables and one binary response variable. I also added one interaction between two of the explanatory variables.

set.seed(1)
n <- 350
v1 <- sample(0:1, n, r=TRUE)
v2 <- sample(0:1, n, r=TRUE)
v3 <- sample(0:1, n, r=TRUE)
re <- 0.6*v1 + 0.8*v2 + 0.6*v3 + v1*v3 + rnorm(n)
re <- re > 1.3

dtf <- data.frame(re, v1, v2, v3)

Then we regress.

# full model
mod0 <- glm(re ~ v1*v2*v3, data=dtf, family=binomial(link="logit"))
summary(mod0)

# full model minus three-way interaction
mod1 <- glm(re ~ v1*v2*v3 - v1:v2:v3, data=dtf, family=binomial(link="logit"))
summary(mod1)

# v1:v3 as only interaction
mod2 <- glm(re ~ v1+v2+v3 + v1:v3, data=dtf, family=binomial(link="logit"))
summary(mod2)

anova(mod0, mod1, mod2)

# odds ratio coefficients and confidence intervals
library(MASS)
exp(cbind(coef(mod2), confint(mod2)))

As these are logistic regressions (using logit link function), the responses aren't in terms of risk ratio, but rather log odds ratio. If you want to estimate risk ratios then it isn't strictly speaking a logistic regression, as you'd have to use logarithm as the link function. This is generally advised against, but can be done.

mod3 <- glm(re ~ v1+v2+v3 + v1:v3, data=dtf, family=binomial(link="log"),
  start=c(log(mean(re)), 0, 0, 0, 0))
summary(mod3)

# risk ratio
exp(cbind(coef(mod3), confint(mod3)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM