I'd like to ask a question inspired by this question asked years ago here in stack overflow
given the data frame: input_df
num_col_1 num_col_2 text_col_1 text_col_2
1 1 4 yes yes
2 2 5 no yes
3 3 6 no <NA>
this chunk of code
library(dplyr)
df %>%
mutate(sum_yes = rowSums(.[c("text_col_1", "text_col_2")] == "yes"))
will produce this new dataframe
> output_df
num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
1 1 4 yes yes 2
2 2 5 no yes 1
3 3 6 no <NA> 0
The question is, how do you do the same with modern dplyr verbs across and c_across ?
thank you.
1) c_across Here c_across
returns a one row tibble containing the columns indicated by its argument.
library(dplyr)
input_df %>%
rowwise %>%
mutate(sum = sum( c_across(starts_with("text")) == "yes", na.rm = TRUE)) %>%
ungroup
giving:
# A tibble: 3 x 5
num_col_1 num_col_2 text_col_1 text_col_2 sum
<int> <int> <chr> <chr> <int>
1 1 4 yes yes 2
2 2 5 no yes 1
3 3 6 no <NA> 0
2) across This gives the same result. across
returns a tibble with only the columns indicated by its argument.
input_df %>%
mutate(sum = rowSums( across(starts_with("text")) == "yes", na.rm = TRUE))
In case it is of interest to sum the scores corresponding to the yes values:
3) c_across
library(dplyr)
input_df %>%
rowwise %>%
mutate(sum = sum( c_across(starts_with("num")) *
(c_across(starts_with("text")) == "yes"), na.rm = TRUE)) %>%
ungroup
giving:
# A tibble: 3 x 5
num_col_1 num_col_2 text_col_1 text_col_2 sum
<int> <int> <chr> <chr> <int>
1 1 4 yes yes 5
2 2 5 no yes 5
3 3 6 no <NA> 0
4) across The output is the same as (3).
input_df %>%
mutate(sum = rowSums(across(starts_with("num")) *
(across(starts_with("text")) == "yes"), na.rm = TRUE))
The input in reproducible form:
Lines <- " num_col_1 num_col_2 text_col_1 text_col_2
1 1 4 yes yes
2 2 5 no yes
3 3 6 no NA"
input_df <- read.table(text = Lines)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.