如何根據列名將功能應用於特定列？

Question

我正在使用類似於以下內容的廣泛數據集：

我正在尋找一個函數，我可以迭代具有相似名稱但名稱不同的列集。 就函數本身而言，為了簡單起見，我將創建一個取兩列平均值的函數。

avg <- function(data, scorecol, distcol) {
  ScoreDistanceAvg = (scorecol + distcol)/2
  data$ScoreDistanceAvg <- ScoreDistanceAvg
  return(data)
}

avg(data = dat, scorecol = dat$ScoreGame0, distcol = dat$DistanceGame0)

如何將新函數應用於名稱重復但數字不同的列集？ 也就是說，如何創建一個取 ScoreGame0 和 DistanceGame0 均值的列，然后創建一個取 ScoreGame5 和 DistanceGame5 均值的列，等等？ 這將是最終輸出：

當然，我可以多次運行該函數，但由於我的完整數據集要大得多，我該如何自動化這個過程呢？ 我想它涉及應用，但我不確定如何將應用與這樣的重復模式一起使用。 此外，我想它可能涉及重寫函數以更好地自動化列的命名。

數據：

structure(list(Player = c("Lebron James", "Lebron James", "Lebron James", 
"Lebron James", "Lebron James", "Lebron James", "Lebron James", 
"Lebron James", "Lebron James", "Lebron James", "Lebron James", 
"Lebron James", "Steph Curry", "Steph Curry", "Steph Curry", 
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry", 
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry"), 
    Game = c(0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L, 
    0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L), ScoreGame0 = c(32L, 
    32L, 32L, 32L, 32L, 32L, 44L, 44L, 44L, 44L, 44L, 44L, 45L, 
    45L, 45L, 45L, 45L, 45L, 76L, 76L, 76L, 76L, 76L, 76L), ScoreGame5 = c(27L, 
    27L, 27L, 27L, 27L, 27L, 12L, 12L, 12L, 12L, 12L, 12L, 76L, 
    76L, 76L, 76L, 76L, 76L, 32L, 32L, 32L, 32L, 32L, 32L), DistanceGame0 = c(12L, 
    12L, 12L, 12L, 12L, 12L, 79L, 79L, 79L, 79L, 79L, 79L, 18L, 
    18L, 18L, 18L, 18L, 18L, 88L, 88L, 88L, 88L, 88L, 88L), DistanceGame5 = c(13L, 
    13L, 13L, 13L, 13L, 13L, 34L, 34L, 34L, 34L, 34L, 34L, 42L, 
    42L, 42L, 42L, 42L, 42L, 54L, 54L, 54L, 54L, 54L, 54L)), class = "data.frame", row.names = c(NA, 
-24L))

Answer 1

稍微重寫你的函數，並通過mapply在列grep使用它。 sort使這更加安全。

avg <- function(scorecol, distcol) {
  (scorecol + distcol)/2
}

mapply(avg, dat[sort(grep('ScoreGame', names(dat)))], dat[sort(grep('DistanceGame', names(dat)))])
#       ScoreGame0 ScoreGame5
#  [1,]       22.0         20
#  [2,]       22.0         20
#  [3,]       22.0         20
#  [4,]       22.0         20
#  [5,]       22.0         20
#  [6,]       22.0         20
#  [7,]       61.5         23
#  [8,]       61.5         23
#  [9,]       61.5         23
# [10,]       61.5         23
# [11,]       61.5         23
# [12,]       61.5         23
# [13,]       31.5         59
# [14,]       31.5         59
# [15,]       31.5         59
# [16,]       31.5         59
# [17,]       31.5         59
# [18,]       31.5         59
# [19,]       82.0         43
# [20,]       82.0         43
# [21,]       82.0         43
# [22,]       82.0         43
# [23,]       82.0         43
# [24,]       82.0         43

看看grep做了什么嘗試

grep('DistanceGame', names(dat), value=TRUE)
# [1] "DistanceGame0" "DistanceGame5"

Answer 2

這是一個帶有 forloop 和readr的解決方案：

library(readr)

game_num <- names(dat) |> 
  readr::parse_number() |> 
  na.omit()

for(i in unique(game_num)) {
  avg <- paste0("ScoreDistanceAvg", i)
  score <- paste0("ScoreGame", i)
  distance <- paste0("DistanceGame", i)
  dat[[avg]] <- (dat[[score]] + dat[[distance]])/2
}

這使：

         Player Game ScoreGame0 ScoreGame5 DistanceGame0 DistanceGame5 ScoreDistanceAvg0 ScoreDistanceAvg5
1  Lebron James    0         32         27            12            13              22.0                20
2  Lebron James    1         32         27            12            13              22.0                20
3  Lebron James    2         32         27            12            13              22.0                20
4  Lebron James    3         32         27            12            13              22.0                20
5  Lebron James    4         32         27            12            13              22.0                20
6  Lebron James    5         32         27            12            13              22.0                20
7  Lebron James    0         44         12            79            34              61.5                23
8  Lebron James    1         44         12            79            34              61.5                23
9  Lebron James    2         44         12            79            34              61.5                23
10 Lebron James    3         44         12            79            34              61.5                23
11 Lebron James    4         44         12            79            34              61.5                23
12 Lebron James    5         44         12            79            34              61.5                23
13  Steph Curry    0         45         76            18            42              31.5                59

Answer 3

在基礎 R 中：

cols_used <- names(df[, -(1:2)])
f <- sub("[^0-9]+", 'ScoreDistance', cols_used)    
data.frame(lapply(split.default(df[cols_used], f), rowMeans))

  ScoreDistance0 ScoreDistance5
1            22.0             20
2            22.0             20
3            22.0             20
4            22.0             20
5            22.0             20
6            22.0             20
7            61.5             23
8            61.5             23
9            61.5             23
10           61.5             23
11           61.5             23
12           61.5             23
13           31.5             59
14           31.5             59
15           31.5             59
16           31.5             59
17           31.5             59
18           31.5             59
19           82.0             43
20           82.0             43
21           82.0             43
22           82.0             43
23           82.0             43
24           82.0             43

使用 tidyverse：

如何根據列名將功能應用於特定列？

問題描述

3 個解決方案

解決方案1
2 2022-06-01 21:23:21

解決方案2
1 2022-06-01 21:31:04

解決方案3
1 2022-06-01 21:45:18

如何根據列名將功能應用於特定列？

問題描述

3 個解決方案

解決方案1 2 2022-06-01 21:23:21

解決方案2 1 2022-06-01 21:31:04

解決方案3 1 2022-06-01 21:45:18

解決方案1
2 2022-06-01 21:23:21

解決方案2
1 2022-06-01 21:31:04

解決方案3
1 2022-06-01 21:45:18