简体   繁体   English

如何构建数据集以随时间运行计数比的二项式 GLM?

[英]How to structure dataset to run Binomial GLM of ratio of counts over time?

I am trying to do an analysis using a binomial GLM to test for differences in relative count frequency over time (Days).我正在尝试使用二项式 GLM 进行分析,以测试相对计数频率随时间(天)的差异。 The GLM model/formula would look something like this: GLM 模型/公式看起来像这样:

(1:2) ∼ Day (1:2) ∼ 日

Where we are testing for the effect of Day on the frequency of A1:A2.我们在哪里测试Day对 A1:A2 频率的影响。 Basically this is a binomial generalized linear model where A1 and A2 refer to the read counts of alternative alleles at each gene and Day is a multilevel factor.基本上,这是一个二项式广义线性 model,其中 A1 和 A2 是指每个基因的替代等位基因的读取计数,而 Day 是一个多级因子。 The other thing is that I would be testing this on many different genes (100's) so that we would be doing many tests.另一件事是,我将在许多不同的基因(100 个)上进行测试,以便我们进行许多测试。

The basic model formula in R is straightforward (eg using a long format dataset): ` R 中的基本 model 公式很简单(例如,使用长格式数据集):`

glm(AF1:AF2 ~ Day, data = dfLong, family = "binomial")

But Im not really sure how to structure the data or loop over the Gene variable to accomplish this task?但我不太确定如何构造数据或遍历Gene变量来完成这项任务?

Here is an example dataframe:这是一个示例 dataframe:

> df<-read.csv("test.csv")
> df
  Gene A.count_1 A.count_2 Day
1    1        60        40   1
2    2       100        30   1
3    3       100         3   1
4    1        55       100   3
5    2       423       410   3
6    3       191        89   3
7    1        20        10   5
8    2       200        10   5
9    3       100        20   5

The output I'd like is the test of the effect of Day as a factor (not a numeric variable) on allele count ratios for each gene, producing a p-value for each gene (eg 1,2, and 3, or more, 100s, in the general case).我想要的 output 是测试Day作为一个因子(不是数字变量)对每个基因的等位基因计数比率的影响,为每个基因(例如 1,2 和 3,或更多)产生一个 p 值, 100s, 在一般情况下)。

Any help to set me in the right direction would be mnuch appreciated.任何帮助我走上正确方向的帮助都将不胜感激。

Thanks!!谢谢!!

I think that我觉得

library('lme4')
m <- lmList(AF1:AF2 ~ Day | Gene, data = dfLong, family = "binomial")
summary(m)

should probably do it?大概应该这样做?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM