简体   繁体   中英

Compute row-wise counts using across or c_across

I'd like to ask a question inspired by this question asked years ago here in stack overflow

given the data frame: input_df

  num_col_1 num_col_2 text_col_1 text_col_2
1         1         4        yes        yes
2         2         5         no        yes
3         3         6         no       <NA>

this chunk of code

library(dplyr)    
df %>%
  mutate(sum_yes = rowSums(.[c("text_col_1", "text_col_2")] == "yes"))

will produce this new dataframe

> output_df
  num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
1         1         4        yes        yes       2
2         2         5         no        yes       1
3         3         6         no       <NA>       0

The question is, how do you do the same with modern dplyr verbs across and c_across ?

thank you.

1) c_across Here c_across returns a one row tibble containing the columns indicated by its argument.

library(dplyr)

input_df %>%
  rowwise %>%
  mutate(sum = sum( c_across(starts_with("text")) == "yes", na.rm = TRUE)) %>%
  ungroup

giving:

# A tibble: 3 x 5
  num_col_1 num_col_2 text_col_1 text_col_2   sum
      <int>     <int> <chr>      <chr>      <int>
1         1         4 yes        yes            2
2         2         5 no         yes            1
3         3         6 no         <NA>           0

2) across This gives the same result. across returns a tibble with only the columns indicated by its argument.

input_df %>%
  mutate(sum = rowSums( across(starts_with("text")) == "yes", na.rm = TRUE)) 

Summing the scores for yes

In case it is of interest to sum the scores corresponding to the yes values:

3) c_across

library(dplyr)

input_df %>%
  rowwise %>%
  mutate(sum = sum( c_across(starts_with("num")) * 
    (c_across(starts_with("text")) == "yes"), na.rm = TRUE)) %>%
  ungroup

giving:

  # A tibble: 3 x 5
  num_col_1 num_col_2 text_col_1 text_col_2   sum
      <int>     <int> <chr>      <chr>      <int>
1         1         4 yes        yes            5
2         2         5 no         yes            5
3         3         6 no         <NA>           0

4) across The output is the same as (3).

input_df %>%
  mutate(sum = rowSums(across(starts_with("num")) * 
                 (across(starts_with("text")) == "yes"), na.rm = TRUE))

Note

The input in reproducible form:

Lines <- "  num_col_1 num_col_2 text_col_1 text_col_2
1         1         4        yes        yes
2         2         5         no        yes
3         3         6         no         NA"
input_df <- read.table(text = Lines)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM