使用比例作为二项式中的偏移量 (glm) model

Question

I am trying to test the effect of a treatment on the proportions of juveniles in a population of migrating birds.我正在尝试测试一种处理方法对候鸟种群中幼鸟比例的影响。 The birds were counted and identified as juveniles or adults daily, but the treatment was only on every second day.每天对这些鸟进行计数并识别为幼鸟或成鸟，但仅每隔一天进行一次处理。 Days without treatment were used as a control.没有治疗的天数被用作对照。 The problem is that the proportion of juveniles in the population is expected to be affected not only by the treatment, but also by migration phenology.问题在于，预计幼鱼在种群中的比例不仅会受到处理的影响，还会受到迁徙物候的影响。 For example, it is possible that on a given day more juveniles migrated to the study area, and therefor this, and not only the treatment, affected the proportion of juveniles in the population.例如，在某一天可能有更多的青少年迁移到研究区域，因此，不仅是治疗，影响了青少年在人口中的比例。 To account for this problem, I also checked the proportion of juveniles every day at a close by site which was not affected by the treatment (ie, control site).为了解决这个问题，我还每天在附近不受处理影响的地点（即控制地点）检查幼鱼的比例。 Hence, I have two types of controls.因此，我有两种类型的控件。 To analyze the data, I thought of using a binomial GLMM, with the proportion of juveniles as the variable of interest, the treatment as a categorical (with or without treatment) explanatory variable and day as a random-intercepts factor, and I use weights to account for the different number of birds in each day, but I am not sure how to input the data from the control site.为了分析数据，我考虑使用二项式 GLMM，将青少年的比例作为感兴趣的变量，将治疗作为分类（有或没有治疗）解释变量，天作为随机截距因子，我使用权重考虑到每天不同数量的鸟类，但我不确定如何从控制站点输入数据。 From what I read, it should be used as an offset, but I am not sure exactly how.根据我的阅读，它应该用作偏移量，但我不确定具体如何。

Is the link function affected by the fact it (juveniles prop. at the ctrl. site) is a proportion?链接 function 是否受到它（在控制站点上的青少年支持）是一个比例这一事实的影响？ Is it better to use a the juveniles prop.使用少年道具更好吗？ at the ctrl.在控制。 site in an interaction instead of offset (ie, ~ Treatment* Juv.prop.cntrl.site)?交互中的站点而不是偏移量（即，~ Treatment* Juv.prop.cntrl.site）？

This is the model I have so far, but I am not sure if it makes sense, especially if the offset is set correctly:这是我到目前为止的 model，但我不确定它是否有意义，特别是如果偏移设置正确：

glm(Juv.prop.exp.site ~ Treatment + Day, offset = Juv.prop.cntrl.site, weights = Tot.birds.exp.site, data = df, family = Binomial)

Where Juv.prop.exp.site is the number of juveniles divided by the total at this site (juveniles + adults) See the data here: DATA (day starts at 11, because during the first 10 days no birds of that species were observed)其中 Juv.prop.exp.site 是幼鸟的数量除以该站点的总数（幼鸟 + 成鸟）请参阅此处的数据： DATA （一天从 11 点开始，因为在前 10 天没有观察到该物种的鸟类)

Thanks a lot in advance and happy new year!非常感谢，新年快乐！

Answer 1

Normally, I would suggest that questions regarding statistical analysis are migrated to CrossValidated, where you will get better answers to purely statistical questions.通常，我会建议将有关统计分析的问题迁移到 CrossValidated，在那里您将获得对纯统计问题的更好答案。 However, in your case, it will help a lot to reshape your data into a tidy format before analysis, which is more of a programming problem.但是，就您而言，在分析之前将数据重塑为整洁的格式会有很大帮助，这更像是一个编程问题。

Essentially, you need one column each for day, site, treatment, number of juveniles, and number of adults.本质上，您需要一栏分别表示日期、地点、治疗、幼鱼数量和成鱼数量。 I am assuming that in your data, "V" is the treatment and "X" is the control.我假设在您的数据中，“V”是治疗，“X”是对照。

library(tidyverse)

df <- data %>%
  select(1, 2, 4, 5, 8, 9) %>%
  rename_all(~gsub("\\.site", "_site", .x)) %>%
  pivot_longer(1:4, names_sep = "\\.", names_to = c(".value", "Site")) %>%
  mutate(Treatment = ifelse(Site == "Exp_site", Treatment, "X")) %>%
  mutate(Treatment = ifelse(Treatment == "V", "Treatment", "Control")) %>%
  mutate(Site = ifelse(Site == "Exp_site", "Experimental", "Control")) %>%
  rename(Juveniles = Juv, Adults = Ad) %>%
  select(2, 1, 3:5)

This makes your data look like this, and to my mind this is easier to analyse (and to reason about):这使您的数据看起来像这样，在我看来这更容易分析（和推理）：

df
#> # A tibble: 100 x 5
#>      Day Treatment Site         Juveniles Adults
#>    <int> <chr>     <chr>            <int>  <int>
#>  1    11 Control   Experimental         1      0
#>  2    11 Control   Control              0      0
#>  3    12 Treatment Experimental         2      1
#>  4    12 Control   Control              1      0
#>  5    13 Control   Experimental         2      0
#>  6    13 Control   Control              1      1
#>  7    14 Treatment Experimental         6      3
#>  8    14 Control   Control              4      2
#>  9    15 Control   Experimental         6      4
#> 10    15 Control   Control              1      2
#> # ... with 90 more rows
#> # i Use `print(n = ...)` to see more rows

You can then perform a binomial glm like this, with Treatment and Site as independent variables.然后，您可以像这样执行二项式glm ，将Treatment和Site作为独立变量。

model <- glm(cbind(Juveniles, Adults) ~ Treatment + Site, 
             data = df, family = binomial)

summary(model)
#> Call:
#> glm(formula = cbind(Juveniles, Adults) ~ Treatment + Site, family = binomial, 
#>     data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -3.4652  -0.6971   0.0000   0.7895   2.9541  
#> 
#> Coefficients:
#>                    Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)          1.0059     0.1461   6.886 5.74e-12 ***
#> TreatmentTreatment   0.3012     0.2877   1.047    0.295    
#> SiteExperimental    -0.1632     0.2598  -0.628    0.530    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 118.16  on 88  degrees of freedom
#> Residual deviance: 117.07  on 86  degrees of freedom
#> AIC: 244.13
#> 
#> Number of Fisher Scoring iterations: 4

使用比例作为二项式中的偏移量 (glm) model

问题描述

1 个解决方案

解决方案1
0 2023-01-01 14:00:45

使用比例作为二项式中的偏移量 (glm) model

问题描述

1 个解决方案

解决方案1 0 2023-01-01 14:00:45

解决方案1
0 2023-01-01 14:00:45