简体   繁体   中英

How to keep 'Time' and 'Group' in this mixed linear regression analysis

I have following study results: Two groups of mice were taken: group A (which received drug A) and group B (which received drug B). Weight was tested at baseline and after 1 month . Hence, data is arranged as follows:

ID      Group       Time        Weight
1       A           basal       25          
1       A           1month      28
2       B           basal       29
2       B           1month      28
...
...

I want to determine if weight change is different in Group A versus Group B.

I took code from similar example at this page: https://scientificallysound.org/2017/08/24/the-likelihood-ratio-test-relevance-and-application/

How do I conduct mixed linear regression for my study. I have 2 options:

md = smf.mixedlm("Weight ~ Group", data, groups=data["Time"])
mdf = md.fit(reml=False)
print(mdf.summary())

Or:

md = smf.mixedlm("Weight ~ Time", data, groups=data["Group"])
mdf = md.fit(reml=False)
print(mdf.summary())

Or I just do linear regression? Here also there are 2 options:

`"Weight ~ Time + Group"` 

and

`"Weight ~ Time + Group + Time*Group"` ?

Note: for above code import statsmodels.formula.api as smf is needed.

You can use a difference-in-difference technique in order to achieve your desired result.

Create a dummy variable for the group variable (ie -> 1 if group == B, 0 if group == A). Then create another dummy variable for the time of sampling (0 if baseline, 1 after-treatment).

Then your 3rd option would work properly in order to get the coefficient of the Time*Group interaction variable.

I guess you know how to write the code better than I do, but statistics-wise, the 3rd solution is definitely what you are looking for in order to supervise the effect of your study.

EDIT - To be clear, the third option is the (Weight ~ Group + Time + Group*Time)

I would probably use a mixed model when dealing with a panel dataset where there are fixed effects embodied in the error component. With mixed regression, you can overcome problems that occur from having data that is sampled from so-called random or different distributions (as a certain dataset, for example, can be distributed differently from country to country and from time to time). From my experience, this scenario is most common when dealing with panel datasets. For your needs, I would definitely go with the diff-in-diff model as the difference between time and groups is the thing you'd like to measure. (And not try to neutralize its effect).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM