Dual beta in python - multiple linear regression with dummy variable in statsmodel

I am trying to calculate the dual beta in python using a statsmodel regression. Unfortunately I am prompting an error message.

The regression equation for dual betas is given here

Dual Beta Formula

I am neglecting the risk free rate (rf) for now, but implementation should be similiar once I add it. For now my code looks as follows, where my 'spx.xlsx' file simple has two columns with returns, called 'SPXrets' and 'AAPLrets' (+ one column with dates):

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np

df = pd.read_excel('spx.xlsx')

mod = smf.ols(formula='AAPLrets ~ SPXrets', data=df)
res = mod.fit()

Prompting an patsy error:

PatsyError: intercept term cannot interact with anything else AAPLrets ~ SPXrets:c(D) + SPXrets:(1-c(D))

Grateful for any help - many thanks!


After my initial suggestions, the OP has changed both the title and the provided code snippet. My suggestions have since been edited accordingly.

New suggestion:

I suspect you're experiencing some problems with your dataset. I suggest that you tell us a little more about the data source, how you've loaded the data, what it looks like (structure) and what type your columns have (string, float etc).

What I can tell you right now, is that your snippet runs fine with some sample data like this:


               CONret  DAXret:c(D)  DAXret:(1-c(D))  AAPLrets  SPXrets  dummy
2017-01-08     109          107              122       101      100      0
2017-01-09     117          108              124       113      147      0
2017-01-10     142          108              130       107      103      1
2017-01-11     106          121              149       103      104      1
2017-01-12     124          149              143       112      126      0


                            OLS Regression Results                            
Dep. Variable:               AAPLrets   R-squared:                       0.095
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     1.044
Date:                Thu, 14 Feb 2019   Prob (F-statistic):              0.331
Time:                        16:00:01   Log-Likelihood:                -48.388
No. Observations:                  12   AIC:                             100.8
Df Residuals:                      10   BIC:                             101.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
Intercept     84.3198     31.143      2.708      0.022      14.929     153.711
SPXrets        0.2635      0.258      1.022      0.331      -0.311       0.838
Omnibus:                        5.649   Durbin-Watson:                   1.882
Prob(Omnibus):                  0.059   Jarque-Bera (JB):                2.933
Skew:                           1.202   Prob(JB):                        0.231
Kurtosis:                       3.290   Cond. No.                         872.

Here's the whole thing:

# imports
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
import statsmodels.api as sm

# sample data
rows = 12
listVars= ['CONret','DAXret:c(D)', 'DAXret:(1-c(D))', 'AAPLrets', 'SPXrets']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df = df.set_index(rng)
df['dummy'] = np.random.randint(2, size=df.shape[0])

mod = smf.ols(formula='AAPLrets ~ SPXrets', data=df)
res = mod.fit()

Another suggestion:

Personally, I'd feel much more comfortable without patsy.

The snippet below will let you run a linear regression and select whether to return the model summary, or a dataframe with other details like coefficient p-values and r-squared.

# Imports
import pandas as pd
import numpy as np
import statsmodels.api as sm

# sample data
rows = 12
listVars= ['CONret','DAXret:c(D)', 'DAXret:(1-c(D))', 'AAPLrets', 'SPXrets']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df = df.set_index(rng)
df['dummy'] = np.random.randint(2, size=df.shape[0])

def LinReg(df, y, x, const, results):

    betas = x.copy()

    # Model with out without a constant
    if const == True:
        x = sm.add_constant(df[x])
        model = sm.OLS(df[y], x).fit()
        model = sm.OLS(df[y], df[x]).fit()

    # Estimates of R2 and p
    res1 = {'Y': [y],
            'R2': [format(model.rsquared, '.4f')],
            'p': [model.pvalues.tolist()],
            'start': [df.index[0]], 
            'stop': [df.index[-1]],
            'obs' : [df.shape[0]],
            'X': [betas]}
    df_res1 = pd.DataFrame(data = res1)

    # Regression Coefficients
    theParams = model.params[0:]
    coefs = theParams.to_frame()
    df_coefs = pd.DataFrame(coefs.T)
    xNames = list(df_coefs)
    xValues = list(df_coefs.loc[0].values)
    xValues2 = [ '%.2f' % elem for elem in xValues ]
    res2 = {'Independent': [xNames],
            'beta': [xValues2]}
    df_res2 = pd.DataFrame(data = res2)

    # All results
    df_res = pd.concat([df_res1, df_res2], axis = 1)
    df_res = df_res.T
    df_res.columns = ['results']

    if results == 'summary':


df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))', 'dummy'], const = True, results = 'summary')


Test run 1:

df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))'], const = True, results = '')

Output 1:

Y                                                       CONret
R2                                                      0.0813
p            [0.13194822614949883, 0.45726622261432304, 0.9...
start                                      2017-01-01 00:00:00
stop                                       2017-01-12 00:00:00
obs                                                         12
X                        [DAXret:c(D), DAXret:(1-c(D)), dummy]
Independent       [const, DAXret:c(D), DAXret:(1-c(D)), dummy]
beta                                [88.94, 0.24, -0.01, 2.20]

Test run 2:

df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))', 'dummy'], const = True, results = 'summary')

Output 2:

                            OLS Regression Results                            
Dep. Variable:                 CONret   R-squared:                       0.081
Model:                            OLS   Adj. R-squared:                 -0.263
Method:                 Least Squares   F-statistic:                    0.2361
Date:                Thu, 14 Feb 2019   Prob (F-statistic):              0.869
Time:                        16:04:02   Log-Likelihood:                -47.138
No. Observations:                  12   AIC:                             102.3
Df Residuals:                       8   BIC:                             104.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                      coef    std err          t      P>|t|      [0.025      0.975]
const              88.9438     53.019      1.678      0.132     -33.318     211.205
DAXret:c(D)         0.2350      0.301      0.781      0.457      -0.459       0.929
DAXret:(1-c(D))    -0.0060      0.391     -0.015      0.988      -0.908       0.896
dummy               2.2005      8.973      0.245      0.812     -18.490      22.891
Omnibus:                        1.025   Durbin-Watson:                   2.354
Prob(Omnibus):                  0.599   Jarque-Bera (JB):                0.720
Skew:                           0.540   Prob(JB):                        0.698
Kurtosis:                       2.477   Cond. No.                     2.15e+03

