I've created code that will take an input vector, create a dataframe based on the input, optimise some values and return some of these values. I'm now turning this into a function that will apply the calculations rowwise on an input dataframe. Below is a minimum working example of what I would like to achieve (my actual function would be too long to share here!):
# Randomly generated dataframe
df <- data.frame(a = rnorm(10, 0, 1), x = rnorm(10, 1, 3), y = rnorm(10, 2, 3))
# Function that takes multiple arguments and returns multiple values in a list
zsummary <- function(x, y) {
if (y < 0) return(list(NA, NA))
z = rnorm(10, x, abs(y))
return(list(mean(z), sd(z)))
}
# Example of something that works using dplyr
# However, this results in a lot of function calls...
# especially if there were a lot of columns in the list...
library(dplyr)
df %>% rowwise() %>%
mutate(mean = zsummary(x,y)[[1]], sd = zsummary(x,y)[[1]])
As you can see, I can't apply individual functions to each new df$mean
and dfsd
columns as they depend on a z
vector that can only be generated once. I've looked around on SO already, but I haven't been able to find an answer yet. I think a solution would be using one of the apply
functions and not something from dplyr
, but I've honestly never fully understood apply
functions. I would also not like solutions that use for
loops with rbind
as I've tried this in previous projects and for large dataframes it becomes very slow!
We can use mapply
for this. As the zsummary
takes two arguments, the mapply
would be one option as it take corresponding element of 'x' and 'y' to apply the zsummary
.
t(mapply(zsummary, df$x, df$y))
We can also change the function slightly and get the output with dplyr
zsummary <- function(x, y) {
if (y < 0) return(data.frame(mean = NA, sd = NA))
z = rnorm(10, x, abs(y))
data.frame(mean = mean(z), sd = sd(z))
}
df %>%
rowwise() %>%
do(data.frame(., zsummary(.$x, .$y)))
Or as we discussed in the comments, instead of having the function taking multiple arguments, have a single argument and use apply
with MARGIN=1
for applying it on each row.
zsummary2 <- function(v1){
if(v1[2] < 0) return(c(mean = NA, sd = NA))
z <- rnorm(10, v1[1], abs(v1[2]))
c(mean = mean(v1), sd= sd(v1))
}
t(apply(df[-1], 1, zsummary2))
# mean sd
# [1,] 1.403066 0.8757504
# [2,] 5.058188 5.1401507
# [3,] 4.288365 1.4194393
# [4,] 1.932829 6.7587054
# [5,] -1.864236 3.7587462
# [6,] NA NA
# [7,] 3.328629 1.3711950
# [8,] -2.347699 5.0449958
# [9,] 2.936615 1.7332283
#[10,] NA NA
NOTE: The values will be different in each run as we didn't set any seed for the rnorm
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.