简体   繁体   中英

R regularize coefficients in regression

I'm trying to use linear regression to figure out the best weighting for 3 models to predict an outcome. So there are 3 variables (x1, x2, x3) that are the predictions of the dependent variable, y . My question is, how do I run a regression with the constraint that the sum of the coefficients sum to 1. For example:

this is good:

y = .2(x1) + .4(x2) + .4(x3) 

since .2 + .4 + .4 = 1

this is no good:

y = 1.2(x1) + .4(x2) + .3(x3)

since 1.2 + .4 + .3 > 1

I'm looking to do this in R if possible. Thanks. Let me know if this needs to get moved to the stats area ('Cross-Validated').

EDIT:

The problem is to classify each row as 1 or 0. y is the actual values ( 0 or 1 ) from the training set, x1 is the predicted values from a kNN model, x2 is from a randomForest, x3 is from a gbm model. I'm trying to get the best weightings for each model, so each coefficient is <=1 and the sum of the coefficients == 1. Would look something like this:

y/Actual value       knnPred      RfPred     gbmPred
      0                .1111       .0546       .03325
      1                .7778       .6245       .60985
      0                .3354       .1293       .33255
      0                .2235       .9987       .10393
      1                .9888       .6753       .88933
     ...                 ...         ...         ...

The measure for success is AUC. So I'm trying to set the coefficients to maximize AUC while making sure they sum to 1.

There's very likely a better way that someone else will share, but you're looking for two parameters such that

b1 * x1 + b2 * x2 + (1 - b1 - b2) * x3

is close to y . To do that, I'd write an error function to minimize

minimizeMe <- function(b, x, y) {  ## Calculates MSE
    mean((b[1] * x[, 1] + b[2] * x[, 2] + (1 - sum(b)) * x[, 3] - y) ^ 2)
}

and throw it to optim

fit <- optim(par = c(.2, .4), fn = minimizeMe, x = cbind(x1, x2, x3), y = y)

No data to test on:

mod1 <- lm(y ~ 0+x1+x2+x3, data=dat)
mod2 <- lm(y/I(sum(coef(mod1))) ~ 0+x1+x2+x3, data=dat)

And now that I think about it some more, skip mod2, just:

coef(mod1)/sum(coef(mod1))

For the five rows shown either of round(knnPred) or round(gbmPred) give perfect predictions so there is some question whether more than one predictor is needed.

At any rate, to solve the given question as stated the following will give nonnegative coefficients that sum to 1 (except possibly for tiny differences due to computer arithmetic). a is the dependent variable and b is a matrix of independent variables. c and d define the equality constraint (coeffs sum to 1) and e and f define the inequality constraints (coeffs are nonnegative).

library(lsei)
a <- cbind(x1, x2, x3)
b <- y
c <- matrix(c(1, 1, 1), 1)
d <- 1
e <- diag(3)
f <- c(0, 0, 0)
lsei(a, b, c, d, e, f)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM