[英]How to structure dataset to run Binomial GLM of ratio of counts over time?
I am trying to do an analysis using a binomial GLM to test for differences in relative count frequency over time (Days).我正在尝试使用二项式 GLM 进行分析,以测试相对计数频率随时间(天)的差异。 The GLM model/formula would look something like this: GLM 模型/公式看起来像这样:
(1:2) ∼ Day (1:2) ∼ 日
Where we are testing for the effect of Day on the frequency of A1:A2.我们在哪里测试Day对 A1:A2 频率的影响。 Basically this is a binomial generalized linear model where A1 and A2 refer to the read counts of alternative alleles at each gene and Day is a multilevel factor.基本上,这是一个二项式广义线性 model,其中 A1 和 A2 是指每个基因的替代等位基因的读取计数,而 Day 是一个多级因子。 The other thing is that I would be testing this on many different genes (100's) so that we would be doing many tests.另一件事是,我将在许多不同的基因(100 个)上进行测试,以便我们进行许多测试。
The basic model formula in R is straightforward (eg using a long format dataset): ` R 中的基本 model 公式很简单(例如,使用长格式数据集):`
glm(AF1:AF2 ~ Day, data = dfLong, family = "binomial")
But Im not really sure how to structure the data or loop over the Gene variable to accomplish this task?但我不太确定如何构造数据或遍历Gene变量来完成这项任务?
Here is an example dataframe:这是一个示例 dataframe:
> df<-read.csv("test.csv")
> df
Gene A.count_1 A.count_2 Day
1 1 60 40 1
2 2 100 30 1
3 3 100 3 1
4 1 55 100 3
5 2 423 410 3
6 3 191 89 3
7 1 20 10 5
8 2 200 10 5
9 3 100 20 5
The output I'd like is the test of the effect of Day as a factor (not a numeric variable) on allele count ratios for each gene, producing a p-value for each gene (eg 1,2, and 3, or more, 100s, in the general case).我想要的 output 是测试Day作为一个因子(不是数字变量)对每个基因的等位基因计数比率的影响,为每个基因(例如 1,2 和 3,或更多)产生一个 p 值, 100s, 在一般情况下)。
Any help to set me in the right direction would be mnuch appreciated.任何帮助我走上正确方向的帮助都将不胜感激。
Thanks!!谢谢!!
I think that我觉得
library('lme4')
m <- lmList(AF1:AF2 ~ Day | Gene, data = dfLong, family = "binomial")
summary(m)
should probably do it?大概应该这样做?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.