For example, is it possible to do this in dplyr:
new_name <- "Sepal.Sum"
col_grep <- "Sepal"
iris <- cbind(iris, tmp_name = rowSums(iris[,grep(col_grep, names(iris))]))
names(iris)[names(iris) == "tmp_name"] <- new_name
This adds up all the columns that contain "Sepal" in the name and creates a new variable named "Sepal.Sum".
Importantly, the solution needs to rely on a grep
(or dplyr:::matches
, dplyr:::one_of
, etc.) when selecting the columns for the rowSums
function, and have the name of the new column be dynamic.
My application has many new columns being created in a loop, so an even better solution would use mutate_each_
to generate many of these new columns.
Here a dplyr
solution that uses the contains
special functions to be used inside select
.
iris %>% mutate(Sepal.Sum = iris %>% rowwise() %>% select(contains("Sepal")) %>% rowSums()) -> iris2
head(iris2)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
and here the benchmarks:
Unit: milliseconds
expr
iris2 <- iris %>% mutate(Sepal.Sum = iris %>% rowwise() %>% select(contains("Sepal")) %>% rowSums())
min lq mean median uq max neval
1.816496 1.86304 2.132217 1.928748 2.509996 5.252626 100
Didn't want to comment this as it's too long.
Not much in it in terms of timing for the solutions (expect the data.table
solution which appearsslower) that have been proposed and none stand out as clearly more elegant.
library(dplyr)
library(data.table)
new_name <- "Sepal.Sum"
col_grep <- "Sepal"
# Make iris bigger
data(iris)
for(i in 1:18){
iris <- bind_rows(iris, iris)
}
iris1 <- iris
system.time({
# Base solution
iris1 <- cbind(iris1, tmp_name = rowSums(iris1[,grep(col_grep, names(iris1))]))
names(iris1)[names(iris1) == "tmp_name"] <- new_name
})
# 1.26
system.time({
# less elegant dplyr solution
iris %>% select(matches(col_grep)) %>% rowSums() %>%
data.frame(.) %>% bind_cols(iris, .) %>% setNames(., c(names(iris), new_name))
})
# 1.14
system.time({
# bit more elegant dplyr solution
iris %>% mutate(tmp_name = rowSums(.[] %>% select(matches(col_grep)))) %>%
rename_(.dots = setNames("tmp_name", new_name))
})
# 1.12
data(iris)
# Make iris bigger
for(i in 1:18){
iris <- rbindlist(list(iris, iris))
}
system.time({
setDT(iris)[, tmp_name := rowSums(.SD[,grep(col_grep, names(iris)), with = FALSE])]
setnames(iris, "tmp_name", new_name)
})
# 2.39
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.