简体   繁体   中英

How to replace the fitted value in multiple columns in R

I have a dataframe called new.cars . I need to apply a linear regression formula to all the columns in my dataframe. There are thousands of columns in new.cars , so indicating each of them would not be possible in the formula. There are four columns, PCAs which will remain same in the formula for all other columns (columns other than PCAs ) in which I want to apply this formula.

The formula for first column (column mercedes ) is

fit1 <- lm(mercedes ~ PCA1 + PCA2+PCA3+PCA4, data=new.cars)
new.cars[,"mercedes"] <-fit1$fitted.values

and so forth for all other car columns.. What would be the best way to replace the column values with the fitted value (and also omitting NA values in the column, which means I don't want to change the NAs--as they are empty cells and need not be fitted)?

  new.cars<- structure(list(mercedes = c(1, 1, 1, 1), vw = c(1, 2, 0, NA), 
            camry = c(2, 0, 0, NA), civic = c(4, 1, 1, 1), ferari = c(2, 
            2, 2, 0), PCA1 = c(0.021122, 0.019087, 0.022184, 0.021464
            ), PCA2 = c(0.023872, 0.024295, 0.022471, 0.027509), PCA3 = c(0.000784, 
            0.001996, 0.003911, 0.006119), PCA4 = c(-0.004811, -0.003296, 
            0.001868, -0.001636)), .Names = c("mercedes", "vw", "camry", 
        "civic", "ferari", "PCA1", "PCA2", "PCA3", "PCA4"), row.names = c("S05-F13-P01.GT", 
        "S08-F10-P01.GT", "S08-F11-P01.GT", "S09-F66-P01.GT"), class = "data.frame")

We may loop through the names of the 'new.cars' (that are not PCA ), create a formula with paste using 'PCA' variables as independent variables, extract the 'fitted values' in a list ('lst'). We create a new dataset ('new1.cars') by subsetting the non-PCA columns from 'new.cars'. Pad NA for list elements that have shorter length than the 'maximum' length of the elements in 'lst' and assign the output to the new dataset.

lst <- lapply(names(new.cars)[1:5], function(x) 
  lm(formula(paste(x, '~', paste0("PCA", 1:4, collapse="+"))), data= new.cars)$fitted.values)
new1.cars <- new.cars[1:5]
new1.cars[] <- lapply(lst, `length<-`,max(lengths(lst)))

Update

If there are columns with only NA values, we can create an exception to avoid doing the lm on that

lst <- lapply(names(new.cars)[1:5], function(x) {
                 x1 <- new.cars[[x]]
            if(all(is.na(x1))){
               NA } else lm(formula(paste(x, '~', paste0("PCA", 1:4, collapse="+"))), 
            data= new.cars)$fitted.values
         })

The rest of the steps are the same as above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM