[英]polr(..) ordinal logistic regression in R
I'm experiencing some trouble when using the polr function. 使用polr函数时遇到了一些麻烦。
Here is a subset of the data I have: 这是我拥有的数据的子集:
# response variable
rep = factor(c(0.00, 0.04, 0.06, 0.13, 0.15, 0.05, 0.07, 0.00, 0.06, 0.04, 0.05, 0.00, 0.92, 0.95, 0.95, 1, 0.97, 0.06, 0.06, 0.03, 0.03, 0.08, 0.07, 0.04, 0.08, 0.03, 0.07, 0.05, 0.05, 0.06, 0.04, 0.04, 0.08, 0.04, 0.04, 0.04, 0.97, 0.03, 0.04, 0.02, 0.04, 0.01, 0.06, 0.06, 0.07, 0.08, 0.05, 0.03, 0.06,0.03))
# "rep" is discrete variable which represents proportion so that it varies between 0 and 1
# It is discrete proportions because it is the proportion of TRUE over a finite list of TRUE/FALSE. example: if the list has 3 arguments, the proportions value can only be 0,1/3,2/3 or 1
# predicted variable
set.seed(10)
pred.1 = sample(x=rep(1:5,10),size=50)
pred.2 = sample(x=rep(c('a','b','c','d','e'),10),size=50)
# "pred" are discrete variables
# polr
polr(rep~pred.1+pred.2)
The subset I gave you works fine ! 我给您的子集效果很好! But my entire data set and some subset of it does not work !
但是我的整个数据集及其某些子集无法正常工作! And I can't find anything in my data that differ from this subset except the quantity.
除了数量,我在数据中找不到与该子集不同的任何内容。 So, here is my question: Is there any limitations in terms of the number of levels for example that would yield to the following error message:
所以,这是我的问题:例如,在级别数方面是否存在任何限制,这会导致以下错误消息:
Error in optim(s0, fmin, gmin, method = "BFGS", ...) :
the initial value in 'vmin' is not finite
and the notification message: 和通知消息:
glm.fit: fitted probabilities numerically 0 or 1 occurred
(I had to translate these two messages into english so they might no be 100% correct) (我不得不将这两个消息翻译成英文,所以它们可能不是100%正确的)
I sometimes only get the notification message and sometimes everything is fine depending on the what subset of my data I use. 有时我只会收到通知消息,有时一切都很好,这取决于我使用的数据子集是什么。
My rep variable have a total of 101 levels for information (and contain nothing else than the kind of data I described) 我的rep变量总共有101个信息级别(除了我描述的数据种类外,没有其他内容)
So it is a terrible question that I am asking becaue I can't give you my full dataset and I don't know where is the problem. 所以这是一个可怕的问题,因为我无法提供完整的数据集,也不知道问题出在哪里。 Can you guess where my problem comes from thanks to these informations ?
通过这些信息,您能否猜出我的问题出在哪里?
Thank you 谢谢
Following @joran's advice that your problem is probably the 100-level factor, I'm going to recommend something that probably isn't statistically valid but will probably still be effective in your particular situation: don't use logistic regression at all. 遵循@joran的建议,即您的问题可能是100级因素,我将向您推荐一些在统计上可能无效但在您的特定情况下仍将有效的方法:完全不要使用逻辑回归。 Just drop it.
放下 Perform a simple linear regression and then discretize your output as necessary using a specialized rounding procedure.
执行简单的线性回归,然后根据需要使用专门的舍入程序离散化输出。 Give it a shot and see how well it works for you.
试一试,看看它对您的效果如何。
rep.v = c(0.00, 0.04, 0.06, 0.13, 0.15, 0.05, 0.07, 0.00, 0.06, 0.04, 0.05, 0.00, 0.92, 0.95, 0.95, 1, 0.97, 0.06, 0.06, 0.03, 0.03, 0.08, 0.07, 0.04, 0.08, 0.03, 0.07, 0.05, 0.05, 0.06, 0.04, 0.04, 0.08, 0.04, 0.04, 0.04, 0.97, 0.03, 0.04, 0.02, 0.04, 0.01, 0.06, 0.06, 0.07, 0.08, 0.05, 0.03, 0.06,0.03)
set.seed(10)
pred.1 = factor(sample(x=rep(1:5,10),size=50))
pred.2 = factor(sample(x=rep(c('a','b','c','d','e'),10),size=50))
model = lm(rep.v~as.factor(pred.1) + as.factor(pred.2))
output = predict(model, newx=data.frame(pred.1, pred.2))
# Here's one way you could accomplish the discretization/rounding
f.levels = unique(rep.v)
rounded = sapply(output, function(x){
d = abs(f.levels-x)
f.levels[d==min(d)]
}
)
>rounded
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0.06 0.07 0.00 0.06 0.15 0.00 0.07 0.00 0.13 0.06 0.06 0.15 0.15 0.92 0.15 0.92 0.15 0.15 0.06 0.06 0.00 0.07 0.15 0.15
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
0.15 0.15 0.00 0.00 0.15 0.00 0.15 0.15 0.07 0.15 0.00 0.07 0.15 0.00 0.15 0.15 0.00 0.15 0.15 0.15 0.92 0.15 0.15 0.00
49 50
0.13 0.15
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.