There are various questions on Stack Overflow regarding this, but I have been unable to find a solution to my question, which follows.
Suppose I have a data frame (or tibble) df
with two columns, say X1
and X2
. I have a function, say f
, which takes inputs X1
and X2
and outputs a vector , say [V1, V2]
. Now, if the output were a singleton, then I would be able to write
df %>% mutate(V = f(X1,X2))
to add a column labelled V
to my df
, and the entry would be f(X1,X2)
. However, I want to add two columns, V1
and V2
. I do not know how to do this.
Of course, I could do something like
df %>% mutate(V1 = f(X1,X2)[1], V2 = f(X1,X2)[2]),
but this (I assume) involves calling the function f
twice; I have a large data set, and would rather not call it twice. Alternatively, I could do
df %>% mutate(V_list = as.list(f(X1,X2)), V1 = V_list[[1]], V2 = V_list[[2]]) %>% select(-V_list),
but this seems like a rather clunky way, and I'd rather not.
Further, I would like eventually to apply this to a group
ed tibble, and so then the naive way of writing this would duplicate V_list
for each entry in the group. As such, ideally any answer would be 'vectorisable', in the following sense. Suppose I have done df %>% group_by(var1)
and have a function f
which takes a data frame with two columns as its input -- this should be thought of as 'a vector of pairs' -- and then outputs a new data frame with two columns.
Here is some code to set-up the example.
library(dplyr)
df = tibble(var1 = c(1,1,2,2), X1 = c(1,2,3,4), X2 = c(5,6,7,8))
f = function(sub_df, var){ return( data.frame(x1 = (x1+x2)^var, x2 = (x1-x2)^var) ) }
With tidyr 1.0.0 you can use unnest_wider
Modify function so output is named
f = function(x1,x2) c(a = x1 + x2, b = x1 - x2)
Create a new column which is a list containing a vector for each row, then apply unnest_wider
to this column to split the vector elements into their own columns.
df %>%
mutate(new = map2(X1, X2, f)) %>%
unnest_wider(new)
# # A tibble: 4 x 5
# var1 X1 X2 a b
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 5 6 -4
# 2 1 2 6 8 -4
# 3 2 3 7 10 -4
# 4 2 4 8 12 -4
This may not be an ideal solution but I have faced this situation and this is what I usually do. Return a delimiter separated string from the function and separate
the column based on that delimiter.
f = function(x1,x2){ return( toString(c(x1+x2, x1-x2))) }
library(tidyverse)
df %>%
mutate(new = map2_chr(X1, X2, f)) %>%
separate(new, c("col1", "col2"), sep = ",", convert = TRUE)
# A tibble: 2 x 4
# X1 X2 col1 col2
# <dbl> <dbl> <int> <int>
#1 1 3 4 -2
#2 2 4 6 -2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.