简体   繁体   中英

How to add multiple columns to a dataframe from a custom function in R

I've created code that will take an input vector, create a dataframe based on the input, optimise some values and return some of these values. I'm now turning this into a function that will apply the calculations rowwise on an input dataframe. Below is a minimum working example of what I would like to achieve (my actual function would be too long to share here!):

# Randomly generated dataframe
df <-  data.frame(a = rnorm(10, 0, 1), x = rnorm(10, 1, 3), y = rnorm(10, 2, 3))

# Function that takes multiple arguments and returns multiple values in a list
zsummary <- function(x, y) { 
  if (y < 0) return(list(NA, NA))
  z = rnorm(10, x, abs(y))
  return(list(mean(z), sd(z)))
}

# Example of something that works using dplyr
#    However, this results in a lot of function calls...
#    especially if there were a lot of columns in the list...
library(dplyr)
df %>% rowwise() %>%
  mutate(mean = zsummary(x,y)[[1]], sd = zsummary(x,y)[[1]])

As you can see, I can't apply individual functions to each new df$mean and dfsd columns as they depend on a z vector that can only be generated once. I've looked around on SO already, but I haven't been able to find an answer yet. I think a solution would be using one of the apply functions and not something from dplyr , but I've honestly never fully understood apply functions. I would also not like solutions that use for loops with rbind as I've tried this in previous projects and for large dataframes it becomes very slow!

We can use mapply for this. As the zsummary takes two arguments, the mapply would be one option as it take corresponding element of 'x' and 'y' to apply the zsummary .

t(mapply(zsummary, df$x, df$y))

We can also change the function slightly and get the output with dplyr

zsummary <- function(x, y) { 
   if (y < 0) return(data.frame(mean = NA, sd = NA))
   z = rnorm(10, x, abs(y))
   data.frame(mean = mean(z), sd = sd(z))
}

 df %>%
     rowwise() %>% 
     do(data.frame(., zsummary(.$x, .$y)))

Or as we discussed in the comments, instead of having the function taking multiple arguments, have a single argument and use apply with MARGIN=1 for applying it on each row.

zsummary2 <- function(v1){
      if(v1[2] < 0) return(c(mean = NA, sd = NA))
      z <- rnorm(10, v1[1], abs(v1[2]))
       c(mean = mean(v1), sd= sd(v1))
     }

t(apply(df[-1], 1, zsummary2))
#         mean        sd
# [1,]  1.403066 0.8757504
# [2,]  5.058188 5.1401507
# [3,]  4.288365 1.4194393
# [4,]  1.932829 6.7587054
# [5,] -1.864236 3.7587462
# [6,]        NA        NA
# [7,]  3.328629 1.3711950
# [8,] -2.347699 5.0449958
# [9,]  2.936615 1.7332283
#[10,]        NA        NA

NOTE: The values will be different in each run as we didn't set any seed for the rnorm .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM