简体   繁体   中英

Replicating result from dplyr::mutate_at() using base R

I am trying to replicate a result of dplyr::mutate_at() using base R. I am fairly new to writing functions myself and I was wondering whether the function I came up with is (a) reasonable and (b) how can I have the cbind() call inside the function and also keep all variables from the diamonds dataset.

First the dplyr::mutate_at() call:

require(tidyverse)

diamonds %>% 
  mutate_at(.funs = funs(relative = ./price), .vars = c("x", "y", "z"))
# A tibble: 53,940 x 13
   #carat cut       color clarity depth table price     x     y     z x_relative y_relative z_relative
   #<dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
 #1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     0.0121     0.0122    0.00745
 #2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     0.0119     0.0118    0.00709
 #3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     0.0124     0.0124    0.00706
 #4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     0.0126     0.0127    0.00787
 #5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     0.0130     0.0130    0.00821
 #6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     0.0117     0.0118    0.00738
 #7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47     0.0118     0.0118    0.00735
 #8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53     0.0121     0.0122    0.00751
 #9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49     0.0115     0.0112    0.00739
#10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39     0.0118     0.0120    0.00707
# ... with 53,930 more rows

This is the function I came up with to replicate the result in base R:

rel_fun <- function(x, y){
  out <- x / y
  colnames(out) <- (paste(colnames(x), "relative", sep = "_"))
  out
}

And here the result:

df_out <- rel_fun(diamonds[c("x", "y", "z")], diamonds$price)
df_out2 <- cbind(diamonds, df_out)
head(df_out2)
  #carat       cut color clarity depth table price    x    y    z x_relative y_relative  z_relative
#1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43 0.01211656 0.01220859 0.007453988
#2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31 0.01193252 0.01177914 0.007085890
#3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31 0.01238532 0.01244648 0.007064220
#4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63 0.01257485 0.01266467 0.007874251
#5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75 0.01295522 0.01298507 0.008208955
#6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48 0.01172619 0.01178571 0.007380952

It all works fine, I would say, but as I mentioned above how can I keep all variables of the diamonds dataset while having cbind() in the function?

I tried the following but I won't get the other variables of the diamonds dataset because I didn't add them in the function. I only added the ones I needed for the calculation, ie diamonds[c("x", "y", "z")] . Is there a way to add something in the function that allows me to keep other variables of the original dataset?

rel_fun <- function(x, y){
  out <- x / y
  colnames(out) <- (paste(colnames(x), "relative", sep = "_"))
  out2 <- cbind(x, out)
  out2
}

df_out3 <- rel_fun(diamonds[c("x", "y", "z")], diamonds$price)
head(df_out3)
#     x    y    z x_relative y_relative  z_relative
#1 3.95 3.98 2.43 0.01211656 0.01220859 0.007453988
#2 3.89 3.84 2.31 0.01193252 0.01177914 0.007085890
#3 4.05 4.07 2.31 0.01238532 0.01244648 0.007064220
#4 4.20 4.23 2.63 0.01257485 0.01266467 0.007874251
#5 4.34 4.35 2.75 0.01295522 0.01298507 0.008208955
#6 3.94 3.96 2.48 0.01172619 0.01178571 0.007380952  

The pipe operator %>% implicitly passes your data frame diamonds as the first argument to mutate_at() . To mimic its behavior, you need to do the same with your function. Because you will be passing the entire data frame to the function, you can also just pass the column names as x :

rel_fun <- function(.data, x, y){
  out <- .data[x] / y
  colnames(out) <- (paste(x, "relative", sep = "_"))
  out2 <- cbind(.data, out)
  out2
}

rel_fun( diamonds, c("x", "y", "z"), diamonds$price )    # Works as desired

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM