This is an extension of my previous post here Estimating parameters using stan when the distribution for response variable in a regression is non-normal .
Let say I have below data
dat = list(y = c(0.00792354094929414, 0.00865300734292492, 0.0297400780486734,
0.0196358416326437, 0.00239020640762042, 0.0258055591736283,
0.17394835142698, 0.156463554455613, 0.329388185725557, 0.00764435088817635,
0.0162081480398152, 0, 0.00157591399416963, 0.420025972703085,
0.000122623651944455, 0.133061480234834, 0.565454216154227, 0.000281973481299731,
0.000559715156383041, 0.0270686389659072, 0.918300537689865,
0.00000782624683025728, 0.00732414341919458, 0, 0, 0, 0, 0, 0,
0, 0.174071274611405, 0.0432109713717948, 0.0544400838264943,
0, 0.0907049925221286, 0.616680102647887, 0, 0), x = c(23.8187587698947,
15.9991138359515, 33.6495930512881, 28.555818797764, -52.2967967248258,
-91.3835208788233, -73.9830692708321, -5.16901145289629, 29.8363012310241,
10.6820057903939, 19.4868517164395, 15.4499668436458, -17.0441644773509,
10.7025053739577, -8.6382953428539, -32.8892974839165, -15.8671863161348,
-11.237248036145, -7.37978020066205, -3.33500586334862, -4.02629933182873,
-20.2413384726948, -54.9094885578775, -48.041459120976, -52.3125732905322,
-35.6269065970458, -62.0296155423529, -49.0825017152659, -73.0574478287598,
-50.9409090127938, -63.4650928035253, -55.1263264283842, -52.2841103768755,
-61.2275334149805, -74.2175990067417, -68.2961107804698, -76.6834643609286,
-70.16769103228), N = 38)
I want to fit a logit
model on above data based on fractional response variable
. Therefore, below is my stan model code
model = "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
transformed data {
vector[N] z = bernoulli_rng(y);
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
mu = alpha + beta * x;
}
model {
sigma ~ normal(0, 1);
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
z ~ bernoulli(mu);
}
"
sampling(stan_model(model_code = model), data = dat, chains = 4, iter = 50000, refresh = 0)
With this I am getting below error
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable definition base type mismatch, variable declared as base type vector variable definition has base type int[ ] error in 'model93e37bdec88_3b62e3bb17b9f3ed9c717c98aa6ca9ac' at line 9, column 32
-------------------------------------------------
7:
8: transformed data {
9: vector[N] z = bernoulli_rng(y);
^
10: }
-------------------------------------------------
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'sampling': failed to parse Stan model '3b62e3bb17b9f3ed9c717c98aa6ca9ac' due to the above error.
Could you please help me to find the correct specification of the stan model?
There might be a deeper issue than how to model saturated probabilities (probabilities that either exactly 0 or exactly 1).
Here is a plot of your data. Visually there isn't much of a relationship between x
and y
.
library("tidyverse")
as_tibble(dat) %>%
ggplot(
aes(x, y)
) +
geom_point() +
scale_y_continuous(
limits = c(0, 1)
)
Created on 2022-03-13 by the reprex package (v2.0.1)
And things don't get better on the logit scale, ie, with the transformation z = logit(y)
.
library("tidyverse")
as_tibble(dat) %>%
# The transformation maps the saturated probabilities to NA.
mutate(
z = qlogis(y)
) %>%
# And ggplot drops the NAs.
ggplot(
aes(x, z)
) +
geom_point()
Created on 2022-03-13 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.