简体   繁体   English

Python lmfit - 如何计算 R 平方?

[英]Python lmfit - how to calculate R squared?

This may be a stupid question, but I didn't find an answer to it anywhere in lmfit's documentation .这可能是一个愚蠢的问题,但我没有在lmfit 的文档中的任何地方找到答案。 My question is simple: how do I retrieve R squared?我的问题很简单:如何检索 R 平方? (I know I can calculate it manually with 1 - SS_res / SS_tot ) (我知道我可以用1 - SS_res / SS_tot手动计算它)

Update: I tried calculating R squared myself and compared it to the R squared from statsmodels .更新:我尝试自己计算 R 平方并将其与statsmodels的 R 平方进行statsmodels Parameters are the same in both estimations, but R squared is not.两个估计中的参数相同,但 R 平方不同。

Code:代码:

from lmfit import minimize, Parameters
import numpy as np
import statsmodels.api as sm
import random


x = np.linspace(0, 15, 10)
x_ols = sm.add_constant(x)
y = [random.randint(0,15) for r in xrange(10)]

model = sm.OLS(y,x_ols)
results = model.fit()
print "OLS: ", format(results.params[0], '.5f'), format(results.params[1], '.5f'), "R^2: ", results.rsquared


# define objective function: returns the array to be minimized
def fcn2min(params, x, data):
    a = params['a'].value
    b = params['b'].value

    model = a + b * x
    return model - data

for i in range(0,1):
    # create a set of Parameters
    params = Parameters()
    params.add('a', value= i)
    params.add('b', value= 20)

    # do fit, here with leastsq model
    result = minimize(fcn2min, params, args=(x, y))

    yhat = params['a'].value + params['b'].value * x
    ybar = np.sum(y)/len(y)
    ssreg = np.sum((yhat-ybar)**2)   # or sum([ (yihat - ybar)**2 for yihat in yhat])
    sstot = np.sum((y - ybar)**2)    # or sum([ (yi - ybar)**2 for yi in y])
    r2 = ssreg / sstot

    print "lmfit: ", format(params['a'].value, '.5f'), format(params['b'].value, '.5f'), "R^2: ", r2

I don't see an included rsquared in lmfit, but we can reuse either the residuals or the redchi我在 lmfit 中没有看到包含的 rsquared,但我们可以重用残差或redchi

I am using a similar example where y contains additional noise我正在使用一个类似的例子,其中 y 包含额外的噪音

lmfit result (assuming the mean residual equals zero, which is always true for linear regression) lmfit 结果(假设平均残差为零,这对于线性回归始终成立)

>>> 1 - result.residual.var() / np.var(y)
0.98132815639800652
>>> 1 - result.redchi / np.var(y, ddof=2)
0.9813281563980063

compared to OLS results:与 OLS 结果相比:

>>> results.rsquared
0.98132815639800663

This is the definition of rsquared when we compare to a model with just an intercept and without weights.当我们与只有截距且没有权重的模型进行比较时,这就是 rsquared 的定义。

The calculations for rsquared in statsmodels are adjusted for the case when the regression does not include an intercept, and they takes weights into account for weighted least squares. statsmodels 中 rsquared 的计算针对回归不包括截距的情况进行了调整,并且它们考虑了加权最小二乘法的权重。

ok, the reason for that is because I chose random y's, so the fitting was poor.好的,这是因为我选择了随机的 y,所以拟合很差。 using a different random generator, who producs better fitting, gives an identical R squared.使用不同的随机生成器,产生更好的拟合,给出相同的 R 平方。 modification is:修改为:

y = np.linspace(0, 15, 50) + [random.randint(0,15) for r in xrange(50)]

btw, the adjusted R squared calculation is:顺便说一句,调整后的 R 平方计算是:

n = len(x)
p = len(params) - 1
r2_adj = r2 - (1-r2) * p / (n-p-1)

you can compute it easily from the residual values:您可以从残差值轻松计算它:

rss = (result.residual**2).sum() # same as result.chisqr    
print(f"RSS/absolute sum of squares (Chi-square) = {rss:3.1f}")

tss = sum(np.power(y - np.mean(y), 2)) 
print(f"TSS = {tss:.1f}")

print(f"R² = {1 - rss/tss:.3f}")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM