简体   繁体   English

如何使用statsmodels获得多元线性回归的标准化(Beta)系数

[英]how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields.使用 .summary .summary() function 使用 pandas 统计模型时,OLS 回归结果包括以下字段。

coef    std err          t      P>|t|      [0.025      0.975]

How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?我如何获得标准化系数(不包括截距),类似于 SPSS 中可实现的?

You just need to standardize your original DataFrame using az distribution (ie, z-score) first and then perform a linear regression.您只需要先使用 z 分布(即 z 分数)对原始 DataFrame 进行标准化,然后执行线性回归。

Assume you name your dataframe as df , which has independent variables x1 , x2 , and x3 , and dependent variable y .假设您将 dataframe 命名为df ,它具有自变量x1x2x3以及因变量y Consider the following code:考虑以下代码:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.现在, coef将向您显示标准化(beta)系数,以便您可以比较它们对因变量的影响。

Notes:笔记:

  1. Please keep in mind that you need .dropna() .请记住,您需要.dropna() Otherwise, stats.zscore will return all NaN for a column if it has any missing values.否则,如果列有任何缺失值, stats.zscore将返回所有NaN
  2. Instead of using .select_dtypes() , you can select column manually but make sure all the columns you selected are numeric.您可以手动选择 select 列,而不是使用.select_dtypes() ,但要确保您选择的所有列都是数字。
  3. If you only care about the standardized (beta) coefficients, you can also use result.params to return it only.如果您只关心标准化(beta)系数,您也可以使用result.params只返回它。 It will usually be displayed in a scientific-notation fashion.它通常以科学记数法的方式显示。 You can use something like round(result.params, 5) to round them.您可以使用类似round(result.params, 5)的东西来舍入它们。

We can just transform the estimated params by the standard deviation of the exog.我们可以通过 exog 的标准偏差来转换估计的params results.t_test(transformation) computes the parameter table for the linearly transformed variables. results.t_test(transformation) 计算线性变换变量的参数表。

AFAIR, the following should produce the beta coefficients and corresponding inferential statistics. AFAIR,以下应产生 beta 系数和相应的推论统计数据。

Compute standard deviation, but set it to 1 for the constant.计算标准偏差,但将其设置为 1 作为常量。

std = model.exog.std(0)
std[0] = 1

Then use results.t_test and look at the params_table.然后使用 results.t_test 并查看 params_table。 np.diag(std) creates a diagonal matrix that transforms the params . np.diag(std)创建一个对角矩阵来转换params

tt = results.t_test(np.diag(std))
print(tt.summary()
tt.summary_frame()

you can convert unstandardized coefficients by taking std deviation.您可以通过采用标准偏差来转换非标准化系数。 Standardized Coefficient (Beta) is the requirement for the driver analysis.标准化系数(Beta)是驱动分析的要求。 Below is the code that works for me.以下是对我有用的代码。 X is independent variables and y is dependent variable and coefficients are coef which are extracted by (model.params) from ols. X 是自变量,y 是因变量,系数是 coef,由 (model.params) 从 ols 中提取。

sd_x = X.std()
sd_y = Y.std()
beta_coefficients = []

# Iterate through independent variables and calculate beta coefficients
for i, col in enumerate(X.columns):
    beta = coefficients[i] * (sd_x[col] / sd_y)
    beta_coefficients.append([col, beta])

# Print beta coefficients
for var, beta in beta_coefficients:
    print(f' {var}: {beta}')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 statsmodels 获得未缩放的回归系数误差? - How to get the unscaled regression coefficients errors using statsmodels? 使用 statsmodels 在多元线性回归中添加常量 - Adding a constant in Multiple Linear Regression using statsmodels 如何从线性回归中获取系数的误差? - How to get errors for coefficients from a linear regression? 是否有可能使用 statsmodels 中的 OLS 以手动声明某些系数的方式指定线性回归? - Is there possibility to specify a linear regression using a OLS from statsmodels in way that some coefficients are declare manually? 在Tensorflow中获取线性回归的系数 - Get coefficients of a linear regression in Tensorflow Statsmodels - Wald检验线性回归模型(OLS)中系数趋势的显着性 - Statsmodels - Wald Test for significance of trend in coefficients in Linear Regression Model (OLS) 线性回归 - 使用 MinMaxScaler() 获取特征重要性 - 极大的系数 - Linear Regression - Get Feature Importance using MinMaxScaler() - Extremely large coefficients 如何推断简单线性回归并获得 Python 中系数的误差? - How to extrapolate simple linear regression and get errors for the coefficients in Python? 在 Python 中获取回归模型的 beta 系数 - get beta coefficients of regression model in Python pandas statsmodels中的多元线性回归:ValueError - Multiple linear regression in pandas statsmodels: ValueError
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM