简体繁体 English

Statsmodels OLS回归：对数似然，用法和解释

[英]Statsmodels OLS Regression: Log-likelihood, uses and interpretation

原文 2014-10-23 17:47:11 7 1 python/ statistics/ statsmodels

I'm using python's statsmodels package to do linear regressions. 我正在使用python的statsmodels包进行线性回归。 Among the output of R^2 , p , etc there is also "log-likelihood". 在R^2 ， p等的输出中，也有“对数似然”。 In the docs this is described as "The value of the likelihood function of the fitted model." 在文档中，这被描述为“拟合模型的似然函数的值”。 I've taken a look at the source code and don't really understand what it's doing. 我看了一下源代码，但并不太了解它在做什么。

Reading more about likelihood functions, I still have very fuzzy ideas of what this 'log-likelihood' value might mean or be used for. 阅读更多有关似然函数的信息后，对于“对数似然”值的含义或用途，我仍然很模糊。 So a few questions: 有几个问题：

Isn't the value of likelihood function, in the case of linear regression, the same as the value of the parameter ( beta in this case)? 在线性回归的情况下，似然函数的值是否与参数的值（在此情况下为beta ）相同？ It seems that way according to the following derivation leading to equation 12: http://www.le.ac.uk/users/dsgp1/COURSES/MATHSTAT/13mlreg.pdf 根据以下推导导致方程12的方法似乎是这样的： http : //www.le.ac.uk/users/dsgp1/COURSES/MATHSTAT/13mlreg.pdf
What's the use of knowing the value of the likelihood function? 知道似然函数的值有什么用？ Is it to compare with other regression models with the same response and a different predictor? 是否可以与其他具有相同响应和不同预测变量的回归模型进行比较？ How do practical statisticians and scientists use the log-likelihood value spit out by statsmodels? 实际的统计学家和科学家如何使用statsmodels得出的对数似然值？

1 个解决方案

Likelihood (and by extension log-likelihood) is one of the most important concepts in statistics. 可能性（并扩展为对数可能性）是统计中最重要的概念之一。 Its used for everything. 它用于一切。

For your first point, likelihood is not the same as the value of the parameter. 对于您的第一点，似然性与参数的值不同。 Likelihood is the likelihood of the entire model given a set of parameter estimates. 给定一组参数估计值的可能性是整个模型的可能性。 It's calculated by taking a set of parameter estimates, calculating the probability density for each one, and then multiplying the probability densities for all the observations together (this follows from probability theory in that P(A and B) = P(A)P(B) if A and B are independent). 它是通过获取一组参数估计值，计算每个参数的概率密度，然后将所有观测值的概率密度相乘而得出的（这是根据概率理论得出的，其中P（A和B）= P（A）P（ B）如果A和B是独立的）。 In practice, what this means for linear regression and what that derivation shows, is that you take a set of parameter estimates (beta, sd), plug them into the normal pdf, and then calculate the density for each observation y at that set of parameter estimates. 在实践中，这对于线性回归和推导显示的意义是，您需要获取一组参数估计值（β，sd），将其插入到正常pdf中，然后在该集合中计算每个观测值y的密度。参数估计。 Then, multiply them all together. 然后，将它们全部相乘。 Typically, we choose to work with the log-likelihood because it's easier to calculate because instead of multiplying we can sum (log(a*b) = log(a) + log(b)), which is computationally faster. 通常，我们选择使用对数似然法，因为它更容易计算，因为可以求和（log（a * b）= log（a）+ log（b）），而不是相乘，这在计算上更快。 Also, we tend to minimize the negative log-likelihood (instead of maximizing the positive), because optimizers sometimes work better on minimization than maximization. 同样，我们倾向于使负对数可能性最小化（而不是使正对数最大化），因为优化器有时在最小化方面比在最大化方面工作更好。

To answer your second point, log-likelihood is used for almost everything. 为了回答您的第二点，对数似然几乎用于所有事物。 It's the basic quantity that we use to find parameter estimates (Maximum Likelihood Estimates) for a huge suite of models. 这是我们用于查找大量模型的参数估计（最大似然估计）的基本数量。 For simple linear regression, these estimates turn out to be the same as those for least squares, but for more complicated models least squares may not work. 对于简单的线性回归，这些估计结果与最小二乘方的估计值相同，但是对于更复杂的模型，最小二乘可能不起作用。 It's also used to calculate AIC, which can be used to compare models with the same response and different predictors (but penalizes on parameter numbers, because more parameters = better fit regardless). 它也可用于计算AIC，可用于比较具有相同响应和不同预测变量的模型（但会惩罚参数编号，因为无论多少，参数越多，拟合效果越好）。