when using the .summary()
function using pandas statsmodels, the OLS Regression Results include the following fields.
coef std err t P>|t| [0.025 0.975]
How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?
You just need to standardize your original DataFrame using az distribution (ie, z-score) first and then perform a linear regression.
Assume you name your dataframe as df
, which has independent variables x1
, x2
, and x3
, and dependent variable y
. Consider the following code:
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)
# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()
# checking results
result.summary()
Now, the coef
will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.
Notes:
.dropna()
. Otherwise, stats.zscore
will return all NaN
for a column if it has any missing values..select_dtypes()
, you can select column manually but make sure all the columns you selected are numeric.result.params
to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5)
to round them.We can just transform the estimated params
by the standard deviation of the exog. results.t_test(transformation) computes the parameter table for the linearly transformed variables.
AFAIR, the following should produce the beta coefficients and corresponding inferential statistics.
Compute standard deviation, but set it to 1 for the constant.
std = model.exog.std(0)
std[0] = 1
Then use results.t_test and look at the params_table. np.diag(std)
creates a diagonal matrix that transforms the params
.
tt = results.t_test(np.diag(std))
print(tt.summary()
tt.summary_frame()
you can convert unstandardized coefficients by taking std deviation. Standardized Coefficient (Beta) is the requirement for the driver analysis. Below is the code that works for me. X is independent variables and y is dependent variable and coefficients are coef which are extracted by (model.params) from ols.
sd_x = X.std()
sd_y = Y.std()
beta_coefficients = []
# Iterate through independent variables and calculate beta coefficients
for i, col in enumerate(X.columns):
beta = coefficients[i] * (sd_x[col] / sd_y)
beta_coefficients.append([col, beta])
# Print beta coefficients
for var, beta in beta_coefficients:
print(f' {var}: {beta}')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.