I'm working out the LD50 (lethal dosage) for multiple populations from different experiments using the MASS
package. It's simple enough when I subset the data and do one at a time, but I'm getting an error when I use ddply
. Essentially I need an LD50 for each population at each temperature.
My data looks somewhat like this:
# dput(d)
d <- structure(list(Pop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("a", "b", "c"), class = "factor"), Temp = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("high", "low"), class = "factor"),
Dose = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), Dead = c(0L,
11L, 12L, 14L, 2L, 16L, 17L, 7L, 5L, 3L, 17L, 15L, 9L, 20L,
8L, 19L, 7L, 2L, 20L, 14L, 9L, 15L, 1L, 15L), Alive = c(20L,
9L, 8L, 6L, 18L, 4L, 3L, 13L, 15L, 17L, 3L, 5L, 11L, 0L,
12L, 1L, 13L, 18L, 0L, 6L, 11L, 5L, 19L, 5L)), .Names = c("Pop",
"Temp", "Dose", "Dead", "Alive"), class = "data.frame", row.names = c(NA,
-24L))
The following works fine:
d$Mortality <- cbind(d$Alive, d$Dead)
a <- d[d$Pop=="a" & d$Temp=="high",]
library(MASS)
dose.p(glm(Mortality ~ Dose, family="binomial", data=a), p=0.5)[1]
But when I put this into ddply
I get the following error:
library(plyr)
d$index <- paste(d$Pop, d$Temp, sep="_")
ddply(d, 'index', function(x) dose.p(glm(Mortality~Dose, family="binomial", data=x), p=0.5)[1])
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
I can get the right LD50 when I use a proportion but can't figure out where I've gone wrong with my approach (and had already written this question).
Perhaps this will amaze you. But if you choose to use formula
cbind(Alive, Dead) ~ Dose
instead of
Mortality ~ Dose
the problem will be gone.
library(MASS)
library(plyr)
## `d` is as your `dput` result
## a function to apply
f <- function(x) {
fit <- glm(cbind(Alive, Dead) ~ Dose, family = "binomial", data = x)
dose.p(fit, p=0.5)[[1]]
}
## call `ddply`
ddply(d, .(Pop, Temp), f)
# Pop Temp V1
#1 a high 2.6946257
#2 a low 2.1834099
#3 b high 2.5000000
#4 b low 0.4830998
#5 c high 2.2899553
#6 c low 2.5000000
So what happened with Mortality ~ Dose
? Let's set .inform = TRUE
when calling ddply
:
## `d` is as your `dput` result
d$Mortality <- cbind(d$Alive, d$Dead)
## a function to apply
g <- function(x) {
fit <- glm(Mortality ~ Dose, family = "binomial", data = x)
dose.p(fit, p=0.5)[[1]]
}
## call `ddply`
ddply(d, .(Pop, Temp), g, .inform = TRUE)
#Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
#Error: with piece 1:
# Pop Temp Dose Dead Alive Mortality
#1 a high 1 0 20 20
#2 a high 2 11 9 9
#3 a high 3 12 8 8
#4 a high 4 14 6 6
Now we we see that variable Mortality
has lost dimension, and only the first column ( Alive
) is retained. For a glm
with binomial
response, if the response is a single vector, glm
expects 0-1 binary or a factor of two levels. Now, we have integers 20, 9, 8, 6, ..., hence glm
will complain
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
There is really no way to fix this issue. I have tried using a protector:
d$Mortality <- I(cbind(d$Alive, d$Dead))
but it still ends up with the same failure.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.