[英]Python Function to Compute a Beta Matrix
I'm looking for an efficient function to automatically produce betas for every possible multiple regression model given a dependent variable and set of predictors as a DataFrame in python. I'm looking for an efficient function to automatically produce betas for every possible multiple regression model given a dependent variable and set of predictors as a DataFrame in python.
For example, given this set of data:例如,给定这组数据:
https://i.stack.imgur.com/YuPuv.jpg https://i.stack.imgur.com/YuPuv.jpg
The dependent variable is 'Cases per Capita' and the columns following are the predictor variables.因变量是“人均病例数”,下面的列是预测变量。
In a simpler example:在一个更简单的例子中:
Student Grade Hours Slept Hours Studied ...
--------- -------- ------------- --------------- -----
A 90 9 1 ...
B 85 7 2 ...
C 100 4 5 ...
... ... ... ... ...
where the beta matrix output would look as such:其中 beta 矩阵 output 如下所示:
Regression Hours Slept Hours Studied
------------ ------------- ---------------
1 # N/A
2 N/A #
3 # #
The table size would be [2^n - 1]
where n
is the number of variables, so in the case with 5 predictors and 1 dependent, there would be 31 regressions, each with a different possible combination of beta
calculations.表大小将是[2^n - 1]
,其中n
是变量的数量,因此在有 5 个预测变量和 1 个依赖变量的情况下,将有 31 个回归,每个回归都有不同的可能的beta
计算组合。
The process is described in greater detail here and an actual solution that is written in R is posted here . 此处更详细地描述了该过程,并在此处发布了用 R 编写的实际解决方案。
I am not aware of any package that already does this.我不知道有任何 package 已经这样做了。 But you can create all those combinations (2^n-1), where n is the number of columns in X (independent variables), and fit a linear regression model for each combination and then get coefficients/betas for each model.但是您可以创建所有这些组合 (2^n-1),其中 n 是 X 中的列数(自变量),并为每个组合拟合线性回归 model,然后获取每个 model 的系数/beta。
Here is how I would do it, hope this helps这是我的做法,希望对你有帮助
from sklearn import datasets, linear_model
import numpy as np
from itertools import combinations
#test dataset
X, y = datasets.load_boston(return_X_y=True)
X = X[:,:3] # Orginal X has 13 columns, only taking n=3 instead of 13 columns
#create all 2^n-1 (here 7 because n=3) combinations of columns, where n is the number of features/indepdent variables
all_combs = []
for i in range(X.shape[1]):
all_combs.extend(combinations(range(X.shape[1]),i+1))
# print 2^n-1 combinations
print('2^n-1 combinations are:')
print(all_combs)
## Create a betas/coefficients as zero matrix with rows (2^n-1) and columns equal to X
betas = np.zeros([len(all_combs), X.shape[1]])+np.NaN
## Fit a model for each combination of columns and add the coefficients into betas matrix
lr = linear_model.LinearRegression()
for regression_no, comb in enumerate(all_combs):
lr.fit(X[:,comb], y)
betas[regression_no, comb] = lr.coef_
## Print Coefficients of each model
print('Regression No'.center(15)+" ".join(['column {}'.format(i).center(10) for i in range(X.shape[1])]))
print('_'*50)
for index, beta in enumerate(betas):
print('{}'.format(index + 1).center(15), " ".join(['{:.4f}'.format(beta[i]).center(10) for i in range(X.shape[1])]))
results in结果是
2^n-1 combinations are:
[(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]
Regression No column 0 column 1 column 2
__________________________________________________
1 -0.4152 nan nan
2 nan 0.1421 nan
3 nan nan -0.6485
4 -0.3521 0.1161 nan
5 -0.2455 nan -0.5234
6 nan 0.0564 -0.5462
7 -0.2486 0.0585 -0.4156
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.