简体   繁体   中英

R: Using dplyr to Mutate Multiple Columns

There are various questions on Stack Overflow regarding this, but I have been unable to find a solution to my question, which follows.

Suppose I have a data frame (or tibble) df with two columns, say X1 and X2 . I have a function, say f , which takes inputs X1 and X2 and outputs a vector , say [V1, V2] . Now, if the output were a singleton, then I would be able to write

df %>% mutate(V = f(X1,X2))

to add a column labelled V to my df , and the entry would be f(X1,X2) . However, I want to add two columns, V1 and V2 . I do not know how to do this.

Of course, I could do something like

df %>% mutate(V1 = f(X1,X2)[1], V2 = f(X1,X2)[2]),

but this (I assume) involves calling the function f twice; I have a large data set, and would rather not call it twice. Alternatively, I could do

df %>% mutate(V_list = as.list(f(X1,X2)), V1 = V_list[[1]], V2 = V_list[[2]]) %>% select(-V_list),

but this seems like a rather clunky way, and I'd rather not.

Further, I would like eventually to apply this to a group ed tibble, and so then the naive way of writing this would duplicate V_list for each entry in the group. As such, ideally any answer would be 'vectorisable', in the following sense. Suppose I have done df %>% group_by(var1) and have a function f which takes a data frame with two columns as its input -- this should be thought of as 'a vector of pairs' -- and then outputs a new data frame with two columns.


Here is some code to set-up the example.

library(dplyr)
df = tibble(var1 = c(1,1,2,2), X1 = c(1,2,3,4), X2 = c(5,6,7,8))
f = function(sub_df, var){ return( data.frame(x1 = (x1+x2)^var, x2 = (x1-x2)^var) ) }

With tidyr 1.0.0 you can use unnest_wider

Modify function so output is named

f = function(x1,x2) c(a = x1 + x2, b = x1 - x2)

Create a new column which is a list containing a vector for each row, then apply unnest_wider to this column to split the vector elements into their own columns.

df %>%
  mutate(new = map2(X1, X2, f)) %>%
  unnest_wider(new)
# # A tibble: 4 x 5
#    var1    X1    X2     a     b
#   <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     1     1     5     6    -4
# 2     1     2     6     8    -4
# 3     2     3     7    10    -4
# 4     2     4     8    12    -4

This may not be an ideal solution but I have faced this situation and this is what I usually do. Return a delimiter separated string from the function and separate the column based on that delimiter.

f = function(x1,x2){ return( toString(c(x1+x2, x1-x2))) }

library(tidyverse)

df %>%
  mutate(new = map2_chr(X1, X2, f)) %>%
  separate(new, c("col1", "col2"), sep = ",", convert = TRUE)

# A tibble: 2 x 4
#     X1    X2  col1  col2
#  <dbl> <dbl> <int> <int>
#1     1     3     4    -2
#2     2     4     6    -2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM