简体   繁体   中英

R - Applying the Same Code to Multiple Columns

In my data cleansing, I have multiple dimension columns with a name in them that need to be aggregated by multiple metric columns. The same code needs to be applied to my dimension columns. I can easily enough copy and paste the same chunk of code ten times and change the column reference, however surely there is a simpler solution.

My research is leading me to believe I am missing something obvious with the sapply() function that I can't put my finger on.

Very basic reprex:

library(tidyverse)

player_1 <- c("Smith", "Adams", "Washington")
player_2 <- c("Johnson", "Jefferson", "Fuller")
player_3 <- c("Forman", "Hyde", "Kelso")
metric_1 <- 1:3
metric_2 <- 2:4
metric_3 <- 3:5

df <- data.frame(player_1, player_2, player_3, metric_1, metric_2, metric_3)

p1 <- df %>% 
  group_by(player_1) %>% 
  summarize_at(c("metric_1", "metric_2", "metric_3"), sum)

Is there a way to only have to type this "p1" code once but have R loop through my columns player_1, player_2, and player_3?

If I can provide more detail, please let me know.

Couple of options:

  1. Use map and iterate over each player individually.
library(tidyverse)

cols <- paste0('player_', 1:3)

map(cols, ~df %>% 
           group_by(.data[[.x]]) %>% 
            summarise(across(starts_with('metric'), sum)))

#[[1]]
# A tibble: 3 x 4
#  player_1   metric_1 metric_2 metric_3
#* <chr>         <int>    <int>    <int>
#1 Adams             2        3        4
#2 Smith             1        2        3
#3 Washington        3        4        5

#[[2]]
# A tibble: 3 x 4
#  player_2  metric_1 metric_2 metric_3
#* <chr>        <int>    <int>    <int>
#1 Fuller           3        4        5
#2 Jefferson        2        3        4
#3 Johnson          1        2        3

#[[3]]
# A tibble: 3 x 4
#  player_3 metric_1 metric_2 metric_3
#* <chr>       <int>    <int>    <int>
#1 Forman          1        2        3
#2 Hyde            2        3        4
#3 Kelso           3        4        5

  1. Get the data in long format.
df %>%
  pivot_longer(cols = starts_with('player')) %>%
  group_by(name, value) %>%
  summarise(across(starts_with('metric'), sum))

#  name     value      metric_1 metric_2 metric_3
#  <chr>    <chr>         <int>    <int>    <int>
#1 player_1 Adams             2        3        4
#2 player_1 Smith             1        2        3
#3 player_1 Washington        3        4        5
#4 player_2 Fuller            3        4        5
#5 player_2 Jefferson         2        3        4
#6 player_2 Johnson           1        2        3
#7 player_3 Forman            1        2        3
#8 player_3 Hyde              2        3        4
#9 player_3 Kelso             3        4        5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM