简体   繁体   中英

R: Force regression coefficients to add up to 1

I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1.

I want:

Y = α + β1 * x1 + β2 * x2 + β3 * x3,
where β1 + β2 = 1

I have found how to make a relation between coefficients like:

β1 = 2* β2

But I haven't found how to make restrictions like:

β1 = 1 - β2

How would I do it in this simple example?

data <- data.frame(
  A = c(1,2,3,4),
  B = c(3,2,2,3),
  C = c(3,3,2,3),
  D = c(5,3,3,4)
)

lm(formula = 'D ~ A + B + C', data = data)

Thanks!

β1 + β2 = 1

To have β1 + β2 = 1 the model you have to fit is

fit <- lm(Y ~  offset(x1) + I(x2 - x1) + x3, data = df)

That is

Y = α + x1 + β2 * (x2 - x1) + β3 * x3

after substituting β1 = 1 - β2 ; x_new = x2 - x1 and the coefficient for x1 is 1.


β1 + β2 + β3 = 1

fit <- lm(Y ~  offset(x1) + I(x2 - x1) + I(x3 - x1), data = df)

Y = α + x1 + β2 * (x2 - x1) + β3 * (x3 - x1)

after substituting β1 = 1 - β2 - β3


β1 + β2 + β3 +... = 1

I think the pattern is clear... you just have to subtract one variable, x1 , from the remaining variables( x2 , x3 , ... ) and have the coefficient of that variable, x1 , to 1.


Example β1 + β2 = 1

# Data
df <- iris[, 1:4]
colnames(df) <- c("Y", paste0("x", 1:3, collaapse=""))

# β1 + β2 = 1
fit <- lm(Y ~  offset(x1) + I(x2 - x1) + x3, data = df)
coef_2 <- coef(fit)
beta_1 <- 1 - coef_2[2]
beta_2 <- coef_2[2]

1) CVXR We can compute the coefficients using CVXR directly by specifying the objective and constraint. We assume that D is the response, the coefficients of A and B must sum to 1, b[1] is the intercept and b[2], b[3] and b[4] are the coefficients of A, B and C respectively.

library(CVXR)

b <- Variable(4)
X <- cbind(1, as.matrix(data[-4]))
obj <- Minimize(sum((data$D - X %*% b)^2))
constraints <- list(b[2] + b[3] == 1)
problem <- Problem(obj, constraints)
soln <- solve(problem)

bval <- soln$getValue(b)
bval
##            [,1]
## [1,]  1.6428605
## [2,] -0.3571428
## [3,]  1.3571428
## [4,] -0.1428588

The objective is the residual sum of squares and it equals:

soln$value
## [1] 0.07142857

2) pracma We can also use the pracma package to compute the coefficients. We specify the X matrix, response vector, the constraint matrix (in this case the vector given as the third argument is regarded as a one row matrix) and the right hand side of the constraint.

library(pracma)

lsqlincon(X, data$D, Aeq = c(0, 1, 1, 0), beq = 1) # X is from above
## [1]  1.6428571 -0.3571429  1.3571429 -0.1428571

3) limSolve This package can also solve for the coefficients of regression problems with constraints. The arguments are the same as in (2).

library(limSolve)
lsei(X, data$D, c(0, 1, 1, 0), 1)

giving:

$X
                    A          B          C 
 1.6428571 -0.3571429  1.3571429 -0.1428571 

$residualNorm
[1] 0

$solutionNorm
[1] 0.07142857

$IsError
[1] FALSE

$type
[1] "lsei"

Check

We can double check the above by using the lm approach in the other answer:

lm(D ~ I(A-B) + C + offset(B), data)

giving:

Call:
lm(formula = D ~ I(A - B) + C + offset(B), data = data)

Coefficients:
(Intercept)     I(A - B)            C  
     1.6429      -0.3571      -0.1429  

The I(AB) coefficient equals the coefficient of A in the original formulation and one minus it is the coefficient of C . We see that all approaches do lead to the same coefficients.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM