Interaction between continuous and categorical variable in R: is there a way to include all categories?

Question

Is there a way to run a linear regression with R with interaction terms between continuous and categorical variable but excluding the continuous variable itself?

I am studying relation between housing rents and dwell floorspace. There are four different regions in my dataset, and I assume that the relation is different across them. I am using linear regression of rent on region and interaction between floorspace and region , and I want to have coefficients on region and on interaction terms, but using lm with interaction term forces floorspace to appear as independent variable, too.

That's how it goes:

lm(formula = rent ~ factor(region) + factor(region) * floorspace, 
    data = mydataset)

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       4.67252    0.06792  68.792  < 2e-16 ***
factor(region)2                  -0.39859    0.09453  -4.216 2.52e-05 ***
factor(region)3                  -0.23631    0.17870  -1.322 0.186078    
factor(region)4                  -0.49076    0.10329  -4.751 2.07e-06 ***
floorspace                       -0.38658    0.01539 -25.119  < 2e-16 ***
factor(region)2:floorspace        0.20481    0.02145   9.550  < 2e-16 ***
factor(region)3:floorspace       -0.00884    0.03987  -0.222 0.824552    
factor(region)4:floorspace        0.08022    0.02348   3.416 0.000638 ***

What I want instead is this:

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       4.67252    0.06792  68.792  < 2e-16 ***
factor(region)2                  -0.39859    0.09453  -4.216 2.52e-05 ***
factor(region)3                  -0.23631    0.17870  -1.322 0.186078    
factor(region)4                  -0.49076    0.10329  -4.751 2.07e-06 ***
factor(region)1:floorspace       -0.38658    0.01539 -25.119  < 2e-16 ***
factor(region)2:floorspace       -0.18177    ???????   ?????  ??????? 
factor(region)3:floorspace       -0.39543    ???????   ?????  ???????    
factor(region)4:floorspace       -0.30636    ???????   ?????  ???????

Reason is that from interpretation point of view it makes more sense to show effect of floorspace for each region separately, instead of showing it for region=1 with floorspace , and the rest as difference between the effect for the given region and the region=1

Answer 1

First I'll make a test data set with: mydataset = data.frame(rent=runif(100), region=sample(1:4, 100,TRUE), floorspace=runif(100))

Take the linear term in floorspace out of the formula by subtraction:

    summary(lm(formula = rent ~ factor(region) + factor(region) * floorspace - floorspace, data=mydataset))

    Call:
    lm(formula = rent ~ factor(region) + factor(region) * floorspace - 
        floorspace, data = mydataset)

    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.52917 -0.26151  0.01225  0.24816  0.52392 

    Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
    (Intercept)                 0.50329    0.09238   5.448 4.23e-07 ***
    factor(region)2             0.01331    0.13804   0.096    0.923    
    factor(region)3             0.05716    0.16860   0.339    0.735    
    factor(region)4            -0.03252    0.16234  -0.200    0.842    
    factor(region)1:floorspace  0.16273    0.22805   0.714    0.477    
    factor(region)2:floorspace  0.01638    0.19894   0.082    0.935    
    factor(region)3:floorspace -0.14251    0.20262  -0.703    0.484    
    factor(region)4:floorspace -0.05094    0.24191  -0.211    0.834

Interaction between continuous and categorical variable in R: is there a way to include all categories?

Question

1 answers

solution1
1 ACCPTED 2019-01-22 16:48:55

Interaction between continuous and categorical variable in R: is there a way to include all categories?

Question

1 answers

solution1 1 ACCPTED 2019-01-22 16:48:55

solution1
1 ACCPTED 2019-01-22 16:48:55