Here is data:
library(tidyverse)
col_pre <- c('a', 'b', 'c')
df <- tibble(a1 = 1:3, a2 = 4:6, b1 = 7:9, b2 = 10:12, c1 = 13:15, c2 = 16:18)
I want to use purrr::map()
and dplyr::mutate()
to create three new columns that are the sums of columns in df
. I can use map()
to iterate over a vector of the a, b, c column prefixes. I figured out the tidyeval
operations so that the code below runs without error.
out <- col_pre %>%
map_df(~ df %>%
mutate(!!as.name(paste0(.x, '3')) := !!as.name(paste0(.x, '1')) + !!as.name(paste0(.x, '2')))
)
However, out
now has six spurious rows:
a1 a2 b1 b2 c1 c2 a3 b3 c3
1 1 4 7 10 13 16 5 NA NA
2 2 5 8 11 14 17 7 NA NA
3 3 6 9 12 15 18 9 NA NA
4 1 4 7 10 13 16 NA 17 NA
5 2 5 8 11 14 17 NA 19 NA
6 3 6 9 12 15 18 NA 21 NA
7 1 4 7 10 13 16 NA NA 29
8 2 5 8 11 14 17 NA NA 31
9 3 6 9 12 15 18 NA NA 33
What it's done is unnecessarily replicate the three rows of the input df
.
Here is the output I want:
a1 a2 b1 b2 c1 c2 a3 b3 c3
1 1 4 7 10 13 16 5 17 29
2 2 5 8 11 14 17 7 19 31
3 3 6 9 12 15 18 9 21 33
I have a feeling purrr::reduce()
could be the solution, but I'm unsure how to apply it.
Any help is appreciated!
We can convert the strings to sym
bol before doing the evaluation, instead of mutate
use transmute
and later bind the columns with the original dataset
library(stringr)
library(purrr)
library(dplyr)
col_pre %>%
map_dfc(~ df %>%
transmute(!! str_c(.x, '3') := !! rlang::sym(str_c(.x, '1')) +
!! rlang::sym(str_c(.x, 2)))) %>%
bind_cols(df, .)
# A tibble: 3 x 9
# a1 a2 b1 b2 c1 c2 a3 b3 c3
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 4 7 10 13 16 5 17 29
#2 2 5 8 11 14 17 7 19 31
#3 3 6 9 12 15 18 9 21 33
Or another option is parse_exprs
df %>%
mutate(!!! rlang::parse_exprs(str_c(sprintf("%s1 + %s2",
col_pre, col_pre), collapse=";"))) %>%
rename_at(vars(contains("+")), ~ str_c(col_pre, 3))
# A tibble: 3 x 9
# a1 a2 b1 b2 c1 c2 a3 b3 c3
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 4 7 10 13 16 5 17 29
#2 2 5 8 11 14 17 7 19 31
#3 3 6 9 12 15 18 9 21 33
Or another option is to convert it to 'long' format with pivot_longer
and then do the calculation
library(tidyr)
df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_to = c(".value", "group"),
names_sep ="(?<=[a-z])(?=[0-9])") %>%
group_by(rn) %>%
summarise_at(vars(col_pre), list(`3` = sum)) %>%
select(-rn) %>%
bind_cols(df, .)
Or if we use the devel
version of dplyr
( '0.8.99.9000'
), then across
can be used along with summarise
df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_to = c(".value", "group"),
names_sep ="(?<=[a-z])(?=[0-9])") %>%
group_by(rn) %>%
summarise(across(col_pre, sum)) %>%
select(-rn) %>%
rename_all(~ str_c(., 3)) %>%
bind_cols(df, .)
# A tibble: 3 x 9
# a1 a2 b1 b2 c1 c2 a3 b3 c3
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 4 7 10 13 16 5 17 29
#2 2 5 8 11 14 17 7 19 31
#3 3 6 9 12 15 18 9 21 33
We can use map_dfc
with transmute
library(dplyr)
library(purrr)
bind_cols(df, map_dfc(col_pre, ~df %>%
transmute(!!paste0(.x, 3) := !!sym(paste0(.x, 1)) + !!sym(paste0(.x, 2)))))
# A tibble: 3 x 9
# a1 a2 b1 b2 c1 c2 a3 b3 c3
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 4 7 10 13 16 5 17 29
#2 2 5 8 11 14 17 7 19 31
#3 3 6 9 12 15 18 9 21 33
In base R, we can use split.default
df[paste0(col_pre, 3)] <- lapply(split.default(df,
sub('\\d', '', names(df))), rowSums)
Or without splitting, we can subset data based on start name of column as @thelatemail suggested
df[paste0(col_pre,3)] <- lapply(col_pre, function(x)
rowSums(df[startsWith(names(df), x)]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.