使用统计模型评估回归系数的t检验

Question

I have a dataset with about 100+ features. 我有约100多个要素的数据集。 I also have a small set of covariates. 我也有一小部分协变量。

I build an OLS linear model using statsmodels for y = x + C1 + C2 + C3 + C4 + ... + Cn for each covariate, and a feature x, and a dependent variable y. 我为每个协变量使用statsmodels建立y = x + C1 + C2 + C3 + C4 + ... + Cn的OLS线性模型，并使用特征x和因变量y。

I'm trying to perform hypothesis testing on the regression coefficients to test if the coefficients are equal to 0. I figured a t-test would be the appropriate approach to this, but I'm not quite sure how to go about implementing this in Python, using statsmodels. 我正在尝试对回归系数进行假设检验，以检验系数是否等于0。我认为t检验将是解决此问题的适当方法，但是我不确定如何在Python，使用statsmodels。

I know, particularly, that I'd want to use http://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.t_test.html#statsmodels.regression.linear_model.RegressionResults.t_test 我特别知道，我想使用http://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.t_test.html#statsmodels.regression.linear_model.RegressionResults.t_test

But I am not certain I understand the r_matrix parameter. 但是我不确定我是否了解r_matrix参数。 What could I provide to this? 我能为此提供什么？ I did look at the examples but it is unclear to me. 我确实看过这些例子，但我不清楚。

Furthermore, I am not interested in doing the t-tests on the covariates themselves, but just the regression co-eff of x. 此外，我对协变量本身的t检验不感兴趣，而只对x的回归系数感兴趣。

Any help appreciated! 任何帮助表示赞赏！

Answer 1

Are you sure you don't want statsmodels.regression.linear_model.OLS ? 您确定不希望statsmodels.regression.linear_model.OLS吗？ This will perform a OLS regression, making available the parameter estimates and the corresponding p-values (and many other things). 这将执行OLS回归，从而提供参数估计值和相应的p值（以及许多其他内容）。

from statsmodels.regression import linear_model
from statsmodels.api import add_constant

Y = [1,2,3,5,6,7,9]
X = add_constant(range(len(Y)))

model = linear_model.OLS(Y, X)
results = model.fit()
print(results.params) # [ 0.75        1.32142857]
print(results.pvalues) # [  2.00489220e-02   4.16826428e-06]

These p-values are from the t-tests of each fit parameter being equal to 0. 这些p值来自每个拟合参数等于0的t检验。

It seems like RegressionResults.t_test would be useful for less conventional hypotheses. 看起来RegressionResults.t_test对于不太传统的假设很有用。

使用统计模型评估回归系数的t检验

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-08-16 14:46:10

使用统计模型评估回归系数的t检验

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-08-16 14:46:10

解决方案1
5 已采纳 2017-08-16 14:46:10