简体繁体 English

逻辑回归的 GLM 函数：默认的预测结果是什么？

[英]GLM function for Logistic Regression: what is the default predicted outcome?

原文 2017-01-08 00:32:54 7 1 r/ regression/ glm

I am relatively new to R modelling and I came across the GLM functions for modelling.我对 R 建模比较陌生，并且遇到了用于建模的 GLM 函数。 I am interested in Logistic regression using the family 'binomial'.我对使用家庭“二项式”的逻辑回归感兴趣。 My question is when my dependent variable can take one of two possible outcomes - say 'positive', 'negative' - what is the default outcome for which the estimates are computed - does the model predict the log odds for a 'positive' or a 'negative' outcome by default ?我的问题是，当我的因变量可以采用两种可能的结果之一时——比如“正”、“负”——计算估计的默认结果是什么——模型是否预测了“正”或“负”的对数几率默认情况下为“负面”结果？ Also, what is the default outcome considered for estimation when the dependent variable is此外，当因变量为

Yes or No Yes或No
1 or 2 1 或 2
Pass or Fail Pass或Fail

etc. ?等等。？

Is there a rule by which R selects this default? R 是否有规则选择此默认值？ Is there a way to override it manually?有没有办法手动覆盖它？ Please clarify.请说清楚。

1 个解决方案

It's in the details of ?binomial :它在?binomial的细节中：

For the 'binomial' and 'quasibinomial' families the response can be specified in one of three ways:对于“二项式”和“拟二项式”族，可以通过以下三种方式之一指定响应：

As a factor: 'success' is interpreted as the factor not having the first level (and hence usually of having the second level).作为一个因素：“成功”被解释为没有第一级的因素（因此通常具有第二级）。 added note : this usually means the first level alphabetically , since this is how R defines factors by default.补充说明：这通常表示按字母顺序排列的第一级，因为这是默认情况下 R 定义因子的方式。

As a numerical vector with values between '0' and '1', interpreted as the proportion of successful cases (with the total number of cases given by the 'weights').作为值介于“0”和“1”之间的数值向量，解释为成功案例的比例（案例总数由“权重”给出）。

As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.作为一个两列整数矩阵：第一列给出成功的次数，第二列给出失败的次数。

So the probability predicted is the probability of "success", ie of the second level of the factor, or the probability of a 1 in the numeric case.所以预测的概率是“成功”的概率，即因子的第二个水平，或数字情况下 1 的概率。

From your examples:从你的例子：

Yes or No: the default will be to treat "No" as a failure (because alphabetical), but you can use my_data$my_factor <- relevel(my_data$my_factor,"Yes") to make "Yes" be the first level.是或否：默认将“否”视为失败（因为按字母顺序排列），但您可以使用my_data$my_factor <- relevel(my_data$my_factor,"Yes")使“是”成为第一级。
1 or 2: this will either fail or produce bogus results. 1 或 2：这将失败或产生虚假结果。 Either make the variable into a factor ("1" will be treated as the first level) or subtract 1 to get a 0/1 variable (or use 2-x if you want 2 to be treated as a failure)要么将变量变成一个因子（“1”将被视为第一级）或减去 1 以获得 0/1 变量（如果您希望 2 被视为失败，则使用2-x ）
Pass or Fail: see "Yes or No" ...通过或失败：参见“是或否”...