简体   繁体   中英

Interaction between continuous and categorical variable in R: is there a way to include all categories?

Is there a way to run a linear regression with R with interaction terms between continuous and categorical variable but excluding the continuous variable itself?

I am studying relation between housing rents and dwell floorspace. There are four different regions in my dataset, and I assume that the relation is different across them. I am using linear regression of rent on region and interaction between floorspace and region , and I want to have coefficients on region and on interaction terms, but using lm with interaction term forces floorspace to appear as independent variable, too.

That's how it goes:

lm(formula = rent ~ factor(region) + factor(region) * floorspace, 
    data = mydataset)

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       4.67252    0.06792  68.792  < 2e-16 ***
factor(region)2                  -0.39859    0.09453  -4.216 2.52e-05 ***
factor(region)3                  -0.23631    0.17870  -1.322 0.186078    
factor(region)4                  -0.49076    0.10329  -4.751 2.07e-06 ***
floorspace                       -0.38658    0.01539 -25.119  < 2e-16 ***
factor(region)2:floorspace        0.20481    0.02145   9.550  < 2e-16 ***
factor(region)3:floorspace       -0.00884    0.03987  -0.222 0.824552    
factor(region)4:floorspace        0.08022    0.02348   3.416 0.000638 ***

What I want instead is this:

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       4.67252    0.06792  68.792  < 2e-16 ***
factor(region)2                  -0.39859    0.09453  -4.216 2.52e-05 ***
factor(region)3                  -0.23631    0.17870  -1.322 0.186078    
factor(region)4                  -0.49076    0.10329  -4.751 2.07e-06 ***
factor(region)1:floorspace       -0.38658    0.01539 -25.119  < 2e-16 ***
factor(region)2:floorspace       -0.18177    ???????   ?????  ??????? 
factor(region)3:floorspace       -0.39543    ???????   ?????  ???????    
factor(region)4:floorspace       -0.30636    ???????   ?????  ??????? 

Reason is that from interpretation point of view it makes more sense to show effect of floorspace for each region separately, instead of showing it for region=1 with floorspace , and the rest as difference between the effect for the given region and the region=1

First I'll make a test data set with: mydataset = data.frame(rent=runif(100), region=sample(1:4, 100,TRUE), floorspace=runif(100))

Take the linear term in floorspace out of the formula by subtraction:

    summary(lm(formula = rent ~ factor(region) + factor(region) * floorspace - floorspace, data=mydataset))

    Call:
    lm(formula = rent ~ factor(region) + factor(region) * floorspace - 
        floorspace, data = mydataset)

    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.52917 -0.26151  0.01225  0.24816  0.52392 

    Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
    (Intercept)                 0.50329    0.09238   5.448 4.23e-07 ***
    factor(region)2             0.01331    0.13804   0.096    0.923    
    factor(region)3             0.05716    0.16860   0.339    0.735    
    factor(region)4            -0.03252    0.16234  -0.200    0.842    
    factor(region)1:floorspace  0.16273    0.22805   0.714    0.477    
    factor(region)2:floorspace  0.01638    0.19894   0.082    0.935    
    factor(region)3:floorspace -0.14251    0.20262  -0.703    0.484    
    factor(region)4:floorspace -0.05094    0.24191  -0.211    0.834    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM