简体   繁体   中英

Keep nan in result when perform statsmodels OLS regression in python

I want to perform OLS regression using python's statsmodels package. But my dataset has nans in it. Currently, I know I can use missing='drop' option when perform OLS regression but some of the results (fitted value or residuals) will have different lengths as the original y variable.

I have the following code as an example:

import numpy as np
import statsmodels.api as sm

yvars = np.array([1.0, 6.0, 3.0, 2.0, 8.0, 4.0, 5.0, 2.0, np.nan, 3.0])
xvars = np.array(
    [
        [1.0, 8.0],
        [8.0, np.nan],
        [np.nan, 3.0],
        [3.0, 6.0],
        [5.0, 3.0],
        [2.0, 7.0],
        [1.0, 3.0],
        [2.0, 2.0],
        [7.0, 9.0],
        [3.0, 1.0],
    ]
)

res = sm.OLS(yvar, sm.add_constant(xvars), missing='drop').fit()
res.resid

The result is as follows:

array([-0.71907958, -1.9012464 ,  1.78811122,  1.18983701,  2.63854267,
       -1.45254075, -1.54362416])

My question is that the result is an array has length 7 (after dropping nans), but the length of yvar is 10. So, what if I want to return the residual of the same length as yvar and just output nan in whatever position where there are at least 1 nan in either yvar or xvars?

Basically, the result I want to get is:

array([-0.71907958, nan , nan , -1.9012464 ,  1.78811122,  1.18983701,  2.63854267,
       -1.45254075, nan , -1.54362416])

That's too difficult to implement in statsmodels. So users need to handle it themselves.

The results attributes like fittedvalues and resid are for the actual sample used.

The predict method of the results instance preserves nans in the provided predict data exog array, but other methods and attributes do not.
results.predict(xvars_all)

One workaround:

Use a pandas DataFrame for the data.
Then, AFAIR, resid and fittedvalues of the results instance are pandas Series with the appropriate index. This can then be used to add those to the original index or DataFrame. That's what the predict method does.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM