Getting transformed X values from OLS model using statsmodels

Question

I am trying to do a linear regression. With the results I want to multiply each x with its own estimated coefficient: x _i · β _i .

However, I am doing a lot of transformations on x _i .

For example:

import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np

def log_plus_1(x):
    return np.log(x + 1.0)

df = sm.datasets.get_rdataset("Guerry", "HistData").data
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
formule = 'Lottery ~ pow(Literacy,2) + log_plus_1(Wealth)'
mod = smf.ols(formula=formule, data=df)
res = mod.fit()
res.params

Now I would need pow(Literacy, 2) and log_plus_1(Wealth) . But since they go into the model, I was hoping to get them out of there too. Instead of transforming the data from the original dataset.

In RI would use res$model to get it.

Answer 1

The data is stored as attributes of the model, eg the design matrix is mod.exog , the dependent or response variable is mod.endog .

(I'm not sure I remember correctly the details of the following: The data that patsy returns after creating the transformed design matrix should, in this case, be a pandas DataFrame, and should be stored in mod.data.orig_exog or something like that.)

res.predict automatically handles the transformation, ie patsy uses the formula information to transform the data for the explanatory variables in prediction in the same way as the data was transformed in creating the model.
predict only returns the prediction and not the internally transformed predict exog .

Getting transformed X values from OLS model using statsmodels

Question

1 answers

solution1
0 ACCPTED 2020-03-22 17:07:45

Getting transformed X values from OLS model using statsmodels

Question

1 answers

solution1 0 ACCPTED 2020-03-22 17:07:45

solution1
0 ACCPTED 2020-03-22 17:07:45