简体   繁体   中英

how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields.

coef    std err          t      P>|t|      [0.025      0.975]

How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?

You just need to standardize your original DataFrame using az distribution (ie, z-score) first and then perform a linear regression.

Assume you name your dataframe as df , which has independent variables x1 , x2 , and x3 , and dependent variable y . Consider the following code:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.

Notes:

  1. Please keep in mind that you need .dropna() . Otherwise, stats.zscore will return all NaN for a column if it has any missing values.
  2. Instead of using .select_dtypes() , you can select column manually but make sure all the columns you selected are numeric.
  3. If you only care about the standardized (beta) coefficients, you can also use result.params to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5) to round them.

We can just transform the estimated params by the standard deviation of the exog. results.t_test(transformation) computes the parameter table for the linearly transformed variables.

AFAIR, the following should produce the beta coefficients and corresponding inferential statistics.

Compute standard deviation, but set it to 1 for the constant.

std = model.exog.std(0)
std[0] = 1

Then use results.t_test and look at the params_table. np.diag(std) creates a diagonal matrix that transforms the params .

tt = results.t_test(np.diag(std))
print(tt.summary()
tt.summary_frame()

you can convert unstandardized coefficients by taking std deviation. Standardized Coefficient (Beta) is the requirement for the driver analysis. Below is the code that works for me. X is independent variables and y is dependent variable and coefficients are coef which are extracted by (model.params) from ols.

sd_x = X.std()
sd_y = Y.std()
beta_coefficients = []

# Iterate through independent variables and calculate beta coefficients
for i, col in enumerate(X.columns):
    beta = coefficients[i] * (sd_x[col] / sd_y)
    beta_coefficients.append([col, beta])

# Print beta coefficients
for var, beta in beta_coefficients:
    print(f' {var}: {beta}')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM