简体   繁体   English

statsmodel OLS和scikit线性回归之间的差异; 不同型号给出不同的r平方

[英]Difference between statsmodel OLS and scikit linear regression; different models give different r square

I am new to python and trying to calculate a simple linear regression. 我是python的新手,正在尝试计算简单的线性回归。 My model has one dependent variable and one independent variable. 我的模型有一个因变量和一个自变量。 I am using linear_model.LinearRegression() from sklearn package. 我正在使用sklearn包中的linear_model.LinearRegression()。 I got an R square value of .16 Then I used import statsmodels.api as sm mod = sm.OLS(Y_train,X_train) and I got an R square of 0.61. 我得到的R平方值为.16,然后使用import statsmodels.api作为sm mod = sm.OLS(Y_train,X_train),得到的R平方值为0.61。 below is the code starting from getting data from big query 以下是从大查询获取数据开始的代码

****Code for linear regression**** 
    train_data_df = pd.read_gbq(query,project_id)
    train_data_df.head()

    X_train = train_data_df.revisit_next_day_rate[:, np.newaxis]
    Y_train = train_data_df.demand_1yr_per_new_member[:, np.newaxis]

#scikit-learn version to get prediction R2
    model_sci = linear_model.LinearRegression()
    model_sci.fit(X_train, Y_train)


    print model_sci.intercept_
    print ('Coefficients: \n', model_sci.coef_)
    print("Residual sum of squares %.2f"
         % np.mean((model_sci.predict(X_train) - Y_train ** 2)))
    print ('Variance score: %.2f' %model_sci.score(X_train, Y_train))
    Y_train_predict = model_sci.predict(X_train)
    print ('R Square', r2_score(Y_train,Y_train_predict) )


****for OLM****

    print Y_train[:3]
    print X_train[:3]
    mod = sm.OLS(Y_train,X_train)
    res = mod.fit()
    print res.summary()

I am very new to this. 我对此很陌生。 Trying to understand which linear regression package should i use? 试图了解我应该使用哪种线性回归软件包?

Found out the difference. 找出差异。 It was the intercept. 那是拦截。 OLS does not take it by default. OLS默认情况下不使用它。 so by adding below code the answers matched. 因此,通过在下面的代码中添加匹配的答案。

X = sm.add_constant(X)
sm.OLS(y,X)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 statsmodel OLS 和 scikit-learn 线性回归的区别 - Difference between statsmodel OLS and scikit-learn linear regression 为什么`sklearn`和`statsmodels`实现OLS回归会给出不同的R ^ 2? - Why `sklearn` and `statsmodels` implementation of OLS regression give different R^2? statsmodel 线性回归 (ols) 的稳健性问题 - Python - Robustness issue of statsmodel Linear regression (ols) - Python R中的lm与Python中的statsmodel OLS的结果不同 - Different results from lm in R vs. statsmodel OLS in Python Logistic回归结果在Scikit python和R中有所不同? - Logistic regression results different in Scikit python and R? Python(和R)和Stata中的线性回归之间的区别 - Difference between linear regression in Python (and R) and Stata 统计模型的线性回归问题 - linear regression problems with statsmodel 使用ScikitLearn进行多元线性回归,不同的方法给出不同的答案 - Multiple Linear Regression using ScikitLearn, different approaches give different answers 在 Python 中使用简单线性回归包的不同结果:statsmodel.api 与 sklearn - Different Results using Simple Linear Regression Packages in Python: statsmodel.api vs sklearn 线性回归 (OLS):使用 Statsmodel summary_Frame() 未准确计算置信区间 - Linear Regression (OLS): Confidence Intervals are not being calculated accurately using Statsmodel summary_Frame()
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM