I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1.
I want:
Y = α + β1 * x1 + β2 * x2 + β3 * x3,
where β1 + β2 = 1
I have found how to make a relation between coefficients like:
β1 = 2* β2
But I haven't found how to make restrictions like:
β1 = 1 - β2
How would I do it in this simple example?
data <- data.frame(
A = c(1,2,3,4),
B = c(3,2,2,3),
C = c(3,3,2,3),
D = c(5,3,3,4)
)
lm(formula = 'D ~ A + B + C', data = data)
Thanks!
To have β1 + β2 = 1
the model you have to fit is
fit <- lm(Y ~ offset(x1) + I(x2 - x1) + x3, data = df)
That is
Y = α + x1 + β2 * (x2 - x1) + β3 * x3
after substituting β1 = 1 - β2 ; x_new = x2 - x1
and the coefficient for x1
is 1.
fit <- lm(Y ~ offset(x1) + I(x2 - x1) + I(x3 - x1), data = df)
Y = α + x1 + β2 * (x2 - x1) + β3 * (x3 - x1)
after substituting β1 = 1 - β2 - β3
I think the pattern is clear... you just have to subtract one variable, x1
, from the remaining variables( x2
, x3
, ...
) and have the coefficient of that variable, x1
, to 1.
# Data
df <- iris[, 1:4]
colnames(df) <- c("Y", paste0("x", 1:3, collaapse=""))
# β1 + β2 = 1
fit <- lm(Y ~ offset(x1) + I(x2 - x1) + x3, data = df)
coef_2 <- coef(fit)
beta_1 <- 1 - coef_2[2]
beta_2 <- coef_2[2]
1) CVXR We can compute the coefficients using CVXR directly by specifying the objective and constraint. We assume that D is the response, the coefficients of A and B must sum to 1, b[1] is the intercept and b[2], b[3] and b[4] are the coefficients of A, B and C respectively.
library(CVXR)
b <- Variable(4)
X <- cbind(1, as.matrix(data[-4]))
obj <- Minimize(sum((data$D - X %*% b)^2))
constraints <- list(b[2] + b[3] == 1)
problem <- Problem(obj, constraints)
soln <- solve(problem)
bval <- soln$getValue(b)
bval
## [,1]
## [1,] 1.6428605
## [2,] -0.3571428
## [3,] 1.3571428
## [4,] -0.1428588
The objective is the residual sum of squares and it equals:
soln$value
## [1] 0.07142857
2) pracma We can also use the pracma package to compute the coefficients. We specify the X matrix, response vector, the constraint matrix (in this case the vector given as the third argument is regarded as a one row matrix) and the right hand side of the constraint.
library(pracma)
lsqlincon(X, data$D, Aeq = c(0, 1, 1, 0), beq = 1) # X is from above
## [1] 1.6428571 -0.3571429 1.3571429 -0.1428571
3) limSolve This package can also solve for the coefficients of regression problems with constraints. The arguments are the same as in (2).
library(limSolve)
lsei(X, data$D, c(0, 1, 1, 0), 1)
giving:
$X
A B C
1.6428571 -0.3571429 1.3571429 -0.1428571
$residualNorm
[1] 0
$solutionNorm
[1] 0.07142857
$IsError
[1] FALSE
$type
[1] "lsei"
We can double check the above by using the lm
approach in the other answer:
lm(D ~ I(A-B) + C + offset(B), data)
giving:
Call:
lm(formula = D ~ I(A - B) + C + offset(B), data = data)
Coefficients:
(Intercept) I(A - B) C
1.6429 -0.3571 -0.1429
The I(AB)
coefficient equals the coefficient of A
in the original formulation and one minus it is the coefficient of C
. We see that all approaches do lead to the same coefficients.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.