简体   繁体   中英

lapply() to use a function over multiple columns of a dataframe

I am tracking the body weights of individuals over time, and the function below allow me to calculate the % body weight of the individual on a particular day, relative to the initial value (essentially dividing the body weight on a particular day by the body weight observed on day 1).

variability <- function(df, column_number) {
  variable_name <- paste0("var_BW", column_number)

   df %>% 
  mutate(!!variable_name := round(100*(df[,column_number]/df[1,column_number]), 1))

}

This function works fine if I use it on one column, but since I have a number of individuals, I would like to use the apply() family to use the function on multiple columns of one dataframe (for instance on columns 1:8 of the dataframe below):

 BW1  BW2  BW3  BW4  BW5  BW6  BW7  BW8
1 18.4 19.6 20.7 17.4 18.7 18.9 19.0 17.8
2 18.1 19.3 20.0 17.5 18.3 19.4 19.5 18.0
3 17.7 18.9 20.4 17.3 18.3 19.2 19.3 17.9

My initial guess is to store the column numbers in a list, and then pass that list as an argument in the lapply() function, as such:

l <- list(1:8)
lapply(working_df, variability, l)

However, when I do that, I get the following error:

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "c('double', 'numeric')" 

Any thoughts?

Does this fit?
As it's possible to vectorize the relative percentage part we can simplify things greatly.

bw <- read.table(text="
 BW1  BW2  BW3  BW4  BW5  BW6  BW7  BW8
1 18.4 19.6 20.7 17.4 18.7 18.9 19.0 17.8
2 18.1 19.3 20.0 17.5 18.3 19.4 19.5 18.0
3 17.7 18.9 20.4 17.3 18.3 19.2 19.3 17.9", header=TRUE)

apply(bw, 2, function(x) round(100*x/x[1], 1))
#     BW1   BW2   BW3   BW4   BW5   BW6   BW7   BW8
# 1 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
# 2  98.4  98.5  96.6 100.6  97.9 102.6 102.6 101.1
# 3  96.2  96.4  98.6  99.4  97.9 101.6 101.6 100.6

Or using sweep()

round(sweep(bw, 2, unlist(bw[1,]), "/")*100, 1)
#     BW1   BW2   BW3   BW4   BW5   BW6   BW7   BW8
# 1 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
# 2  98.4  98.5  96.6 100.6  97.9 102.6 102.6 101.1
# 3  96.2  96.4  98.6  99.4  97.9 101.6 101.6 100.6

Or even simpler

round(100 * t(t(bw) / as.matrix(bw)[1,]), 1)
#     BW1   BW2   BW3   BW4   BW5   BW6   BW7   BW8
# 1 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
# 2  98.4  98.5  96.6 100.6  97.9 102.6 102.6 101.1
# 3  96.2  96.4  98.6  99.4  97.9 101.6 101.6 100.6

You don't really need apply in this case.

pctvals <- round(100.0 * bw[,1:ncol(bw)] / bw[,1], 2)

yields

  BW1    BW2    BW3   BW4    BW5    BW6    BW7    BW8
1 100 106.52 112.50 94.57 101.63 102.72 103.26  96.74
2 100 106.63 110.50 96.69 101.10 107.18 107.73  99.45
3 100 106.78 115.25 97.74 103.39 108.47 109.04 101.13

There's a super simple option in using mutate_at from the dplyr package:

library(dplyr)

working_df <-
  data.frame(BW1 = c(18.4, 18.1, 17.7),
             BW2 = c(19.6, 19.3, 18.9),
             BW3 = c(20.7, 20.0, 20.4))

variability_v2 <- function(df, column_numbers) {

  df %>% 
    mutate_at(vars(column_numbers), funs(var = round(100*(./first(.)), 1)))

}

variability_v2(working_df, 1:3)
#>    BW1  BW2  BW3 BW1_var BW2_var BW3_var
#> 1 18.4 19.6 20.7   100.0   100.0   100.0
#> 2 18.1 19.3 20.0    98.4    98.5    96.6
#> 3 17.7 18.9 20.4    96.2    96.4    98.6

The only 2 (very minor issues, in my opinion) with this method are:

  • If you only feed a single column number into the function, then the new column is simply called "var"
  • The "var" is appended after the column name, not before it

The former could be dealt with by a simple "if" statement within the function, carving out the situation where there is only one column specified. Hopefully you just don't care about the latter!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM