`ddply` fails to apply logistic regression (GLM) by group to my dataset

Question

I'm working out the LD50 (lethal dosage) for multiple populations from different experiments using the MASS package. It's simple enough when I subset the data and do one at a time, but I'm getting an error when I use ddply . Essentially I need an LD50 for each population at each temperature.

My data looks somewhat like this:

# dput(d)
d <- structure(list(Pop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L), .Label = c("a", "b", "c"), class = "factor"), Temp = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("high", "low"), class = "factor"), 
Dose = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), Dead = c(0L, 
11L, 12L, 14L, 2L, 16L, 17L, 7L, 5L, 3L, 17L, 15L, 9L, 20L, 
8L, 19L, 7L, 2L, 20L, 14L, 9L, 15L, 1L, 15L), Alive = c(20L, 
9L, 8L, 6L, 18L, 4L, 3L, 13L, 15L, 17L, 3L, 5L, 11L, 0L, 
12L, 1L, 13L, 18L, 0L, 6L, 11L, 5L, 19L, 5L)), .Names = c("Pop", 
"Temp", "Dose", "Dead", "Alive"), class = "data.frame", row.names = c(NA, 
-24L))

The following works fine:

d$Mortality <- cbind(d$Alive, d$Dead)
a <- d[d$Pop=="a" & d$Temp=="high",]
library(MASS)
dose.p(glm(Mortality ~ Dose, family="binomial", data=a), p=0.5)[1]

But when I put this into ddply I get the following error:

library(plyr)
d$index <- paste(d$Pop, d$Temp, sep="_")
ddply(d, 'index', function(x) dose.p(glm(Mortality~Dose, family="binomial", data=x), p=0.5)[1])

Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

I can get the right LD50 when I use a proportion but can't figure out where I've gone wrong with my approach (and had already written this question).

Answer 1

Perhaps this will amaze you. But if you choose to use formula

cbind(Alive, Dead) ~ Dose

instead of

Mortality ~ Dose

the problem will be gone.

library(MASS)
library(plyr)

## `d` is as your `dput` result

## a function to apply
f <- function(x) {
  fit <- glm(cbind(Alive, Dead) ~ Dose, family = "binomial", data = x)
  dose.p(fit, p=0.5)[[1]]
  }

## call `ddply`
ddply(d, .(Pop, Temp), f)

#  Pop Temp        V1
#1   a high 2.6946257
#2   a  low 2.1834099
#3   b high 2.5000000
#4   b  low 0.4830998
#5   c high 2.2899553
#6   c  low 2.5000000

So what happened with Mortality ~ Dose ? Let's set .inform = TRUE when calling ddply :

## `d` is as your `dput` result
d$Mortality <- cbind(d$Alive, d$Dead)

## a function to apply
g <- function(x) {
  fit <- glm(Mortality ~ Dose, family = "binomial", data = x)
  dose.p(fit, p=0.5)[[1]]
  }

## call `ddply`
ddply(d, .(Pop, Temp), g, .inform = TRUE)

#Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
#Error: with piece 1: 
#  Pop Temp Dose Dead Alive Mortality
#1   a high    1    0    20        20
#2   a high    2   11     9         9
#3   a high    3   12     8         8
#4   a high    4   14     6         6

Now we we see that variable Mortality has lost dimension, and only the first column ( Alive ) is retained. For a glm with binomial response, if the response is a single vector, glm expects 0-1 binary or a factor of two levels. Now, we have integers 20, 9, 8, 6, ..., hence glm will complain

Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

There is really no way to fix this issue. I have tried using a protector:

d$Mortality <- I(cbind(d$Alive, d$Dead))

but it still ends up with the same failure.

`ddply` fails to apply logistic regression (GLM) by group to my dataset

Question

1 answers

solution1
4 ACCPTED 2016-10-06 02:44:52

`ddply` fails to apply logistic regression (GLM) by group to my dataset

Question

1 answers

solution1 4 ACCPTED 2016-10-06 02:44:52

solution1
4 ACCPTED 2016-10-06 02:44:52