[英]how to get standardised (Beta) coefficients for multiple linear regression using statsmodels
when using the .summary()
function using pandas statsmodels, the OLS Regression Results include the following fields.使用 .summary .summary()
function 使用 pandas 统计模型时,OLS 回归结果包括以下字段。
coef std err t P>|t| [0.025 0.975]
How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?我如何获得标准化系数(不包括截距),类似于 SPSS 中可实现的?
You just need to standardize your original DataFrame using az distribution (ie, z-score) first and then perform a linear regression.您只需要先使用 z 分布(即 z 分数)对原始 DataFrame 进行标准化,然后执行线性回归。
Assume you name your dataframe as df
, which has independent variables x1
, x2
, and x3
, and dependent variable y
.假设您将 dataframe 命名为df
,它具有自变量x1
、 x2
和x3
以及因变量y
。 Consider the following code:考虑以下代码:
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)
# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()
# checking results
result.summary()
Now, the coef
will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.现在, coef
将向您显示标准化(beta)系数,以便您可以比较它们对因变量的影响。
Notes:笔记:
.dropna()
.请记住,您需要.dropna()
。 Otherwise, stats.zscore
will return all NaN
for a column if it has any missing values.否则,如果列有任何缺失值, stats.zscore
将返回所有NaN
。.select_dtypes()
, you can select column manually but make sure all the columns you selected are numeric.您可以手动选择 select 列,而不是使用.select_dtypes()
,但要确保您选择的所有列都是数字。result.params
to return it only.如果您只关心标准化(beta)系数,您也可以使用result.params
只返回它。 It will usually be displayed in a scientific-notation fashion.它通常以科学记数法的方式显示。 You can use something like round(result.params, 5)
to round them.您可以使用类似round(result.params, 5)
的东西来舍入它们。We can just transform the estimated params
by the standard deviation of the exog.我们可以通过 exog 的标准偏差来转换估计的params
。 results.t_test(transformation) computes the parameter table for the linearly transformed variables. results.t_test(transformation) 计算线性变换变量的参数表。
AFAIR, the following should produce the beta coefficients and corresponding inferential statistics. AFAIR,以下应产生 beta 系数和相应的推论统计数据。
Compute standard deviation, but set it to 1 for the constant.计算标准偏差,但将其设置为 1 作为常量。
std = model.exog.std(0)
std[0] = 1
Then use results.t_test and look at the params_table.然后使用 results.t_test 并查看 params_table。 np.diag(std)
creates a diagonal matrix that transforms the params
. np.diag(std)
创建一个对角矩阵来转换params
。
tt = results.t_test(np.diag(std))
print(tt.summary()
tt.summary_frame()
you can convert unstandardized coefficients by taking std deviation.您可以通过采用标准偏差来转换非标准化系数。 Standardized Coefficient (Beta) is the requirement for the driver analysis.标准化系数(Beta)是驱动分析的要求。 Below is the code that works for me.以下是对我有用的代码。 X is independent variables and y is dependent variable and coefficients are coef which are extracted by (model.params) from ols. X 是自变量,y 是因变量,系数是 coef,由 (model.params) 从 ols 中提取。
sd_x = X.std()
sd_y = Y.std()
beta_coefficients = []
# Iterate through independent variables and calculate beta coefficients
for i, col in enumerate(X.columns):
beta = coefficients[i] * (sd_x[col] / sd_y)
beta_coefficients.append([col, beta])
# Print beta coefficients
for var, beta in beta_coefficients:
print(f' {var}: {beta}')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.