简体   繁体   中英

What test (null hypothesis) does a model's `f_pvalue` correspond to?

What is the null hypothesis behind an OLSResults 's f_pvalue attribute? This docstring is not particularly useful.

At first I thought the null hypothesis was that all estimated coefficients are simultaneously zero (including the constant term). However, I am starting to think that the hypothesis being tested is that all estimated parameters except for the constant term are simultaneously zero (ie b1 = b2 =... = bp = 0 , excluding b0 ).

For example, suppose y is an array of targets and X is a numpy matrix of features (a constant term and p features).

# Silly example
from statsmodels.api import OLS
m = OLS(endog=y, exog=X).fit()

# What is being tested here?
print(m.f_pvalue)

Does anyone know what the null hypothesis is?

Thanks to @Josef for clearing things up. As per the documentation :

F-statistic of the fully specified model.

Calculated as the mean squared error of the model divided by the mean squared error of the residuals if the nonrobust covariance is used. Otherwise computed using a Wald-like quadratic form that tests whether all coefficients (excluding the constant) are zero.

And just to prove that this is the case:

# Libraries
import numpy as np
import pandas as pd
from statsmodels.api import OLS
from sklearn.datasets import load_boston

# Load target
y = pd.DataFrame(load_boston()['target'], columns=['price'])

# Load features
X = pd.DataFrame(load_boston()['data'], columns=load_boston()['feature_names'])

# Add constant
X['CONST'] = 1

# One feature
m1 = OLS(endog=y, exog=X[['CONST','CRIM']]).fit()
print(f'm1 pvalue: {m1.f_pvalue}')

# Multiple features
m2 = OLS(endog=y, exog=X[['CONST','CRIM','AGE']]).fit()
print(f'm2 pvalue: {m2.f_pvalue}')

# Manually test H0: all coefficients are zero (excluding b0)
print('Manual F-test for m1', m1.f_test(r_matrix=np.matrix([[0,0],[0,1]])),
      'Manual F-test for m2', m2.f_test(r_matrix=np.matrix([[0,0,0],[0,1,0],[0,0,1]])),
      sep='\n')

# Output
"""
> m1 pvalue: 1.1739870821944483e-19
> m2 pvalue: 2.2015246345918656e-27
> Manual F-test for m1
> <F test: F=array([[89.48611476]]), p=1.1739870821945733e-19, df_denom=504, df_num=1>
> Manual F-test for m2
> <F test: F=array([[69.51929476]]), p=2.2015246345920063e-27, df_denom=503, > df_num=2>
"""

So yes, f_pvalue matches the p-value of manually entering the null hypothesis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM