简体   繁体   中英

Multiple Linear Regression using interactive terms in python

I am predicting a model using interactive terms:

est = smf.ols(formula='mdvis ~ hlthp * logincome', data=df).fit(). 

I get pretty good score when used with linear regression around 97%- R square.

So, my question is:
While predicting using interactive terms, how to evaluate by using the test/train data and also calculate stats significance using cross validation?

Using interaction terms is only an easy way to build the exog matrices for the regression. It doesn't change the logic of the cross validation.

Split your dataframe into train and test samples:

train = df.sample(frac=0.8)
test  = df.drop(train.index)

Then fit the model on the train data:

res = smf.ols(formula='mdvis ~ hlthp * logincome', data=train).fit()

Predict on the whole data (train and test)

df['predict']=res.predict(exog=df)
df['delta']  = df['predict']-df['mdvis']

Finaly make statistics on each train and test sample separated as needed (here I calculate standard deviations on the residuals):

std_train=df.loc[train.index]['delta'].std()
std_test =df.loc[test.index]['delta'].std()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM