如何根据列名将功能应用于特定列？

Question

我正在使用类似于以下内容的广泛数据集：

我正在寻找一个函数，我可以迭代具有相似名称但名称不同的列集。 就函数本身而言，为了简单起见，我将创建一个取两列平均值的函数。

avg <- function(data, scorecol, distcol) {
  ScoreDistanceAvg = (scorecol + distcol)/2
  data$ScoreDistanceAvg <- ScoreDistanceAvg
  return(data)
}

avg(data = dat, scorecol = dat$ScoreGame0, distcol = dat$DistanceGame0)

如何将新函数应用于名称重复但数字不同的列集？ 也就是说，如何创建一个取 ScoreGame0 和 DistanceGame0 均值的列，然后创建一个取 ScoreGame5 和 DistanceGame5 均值的列，等等？ 这将是最终输出：

当然，我可以多次运行该函数，但由于我的完整数据集要大得多，我该如何自动化这个过程呢？ 我想它涉及应用，但我不确定如何将应用与这样的重复模式一起使用。 此外，我想它可能涉及重写函数以更好地自动化列的命名。

数据：

structure(list(Player = c("Lebron James", "Lebron James", "Lebron James", 
"Lebron James", "Lebron James", "Lebron James", "Lebron James", 
"Lebron James", "Lebron James", "Lebron James", "Lebron James", 
"Lebron James", "Steph Curry", "Steph Curry", "Steph Curry", 
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry", 
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry"), 
    Game = c(0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L, 
    0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L), ScoreGame0 = c(32L, 
    32L, 32L, 32L, 32L, 32L, 44L, 44L, 44L, 44L, 44L, 44L, 45L, 
    45L, 45L, 45L, 45L, 45L, 76L, 76L, 76L, 76L, 76L, 76L), ScoreGame5 = c(27L, 
    27L, 27L, 27L, 27L, 27L, 12L, 12L, 12L, 12L, 12L, 12L, 76L, 
    76L, 76L, 76L, 76L, 76L, 32L, 32L, 32L, 32L, 32L, 32L), DistanceGame0 = c(12L, 
    12L, 12L, 12L, 12L, 12L, 79L, 79L, 79L, 79L, 79L, 79L, 18L, 
    18L, 18L, 18L, 18L, 18L, 88L, 88L, 88L, 88L, 88L, 88L), DistanceGame5 = c(13L, 
    13L, 13L, 13L, 13L, 13L, 34L, 34L, 34L, 34L, 34L, 34L, 42L, 
    42L, 42L, 42L, 42L, 42L, 54L, 54L, 54L, 54L, 54L, 54L)), class = "data.frame", row.names = c(NA, 
-24L))

Answer 1

稍微重写你的函数，并通过mapply在列grep使用它。 sort使这更加安全。

avg <- function(scorecol, distcol) {
  (scorecol + distcol)/2
}

mapply(avg, dat[sort(grep('ScoreGame', names(dat)))], dat[sort(grep('DistanceGame', names(dat)))])
#       ScoreGame0 ScoreGame5
#  [1,]       22.0         20
#  [2,]       22.0         20
#  [3,]       22.0         20
#  [4,]       22.0         20
#  [5,]       22.0         20
#  [6,]       22.0         20
#  [7,]       61.5         23
#  [8,]       61.5         23
#  [9,]       61.5         23
# [10,]       61.5         23
# [11,]       61.5         23
# [12,]       61.5         23
# [13,]       31.5         59
# [14,]       31.5         59
# [15,]       31.5         59
# [16,]       31.5         59
# [17,]       31.5         59
# [18,]       31.5         59
# [19,]       82.0         43
# [20,]       82.0         43
# [21,]       82.0         43
# [22,]       82.0         43
# [23,]       82.0         43
# [24,]       82.0         43

看看grep做了什么尝试

grep('DistanceGame', names(dat), value=TRUE)
# [1] "DistanceGame0" "DistanceGame5"

Answer 2

这是一个带有 forloop 和readr的解决方案：

library(readr)

game_num <- names(dat) |> 
  readr::parse_number() |> 
  na.omit()

for(i in unique(game_num)) {
  avg <- paste0("ScoreDistanceAvg", i)
  score <- paste0("ScoreGame", i)
  distance <- paste0("DistanceGame", i)
  dat[[avg]] <- (dat[[score]] + dat[[distance]])/2
}

这使：

         Player Game ScoreGame0 ScoreGame5 DistanceGame0 DistanceGame5 ScoreDistanceAvg0 ScoreDistanceAvg5
1  Lebron James    0         32         27            12            13              22.0                20
2  Lebron James    1         32         27            12            13              22.0                20
3  Lebron James    2         32         27            12            13              22.0                20
4  Lebron James    3         32         27            12            13              22.0                20
5  Lebron James    4         32         27            12            13              22.0                20
6  Lebron James    5         32         27            12            13              22.0                20
7  Lebron James    0         44         12            79            34              61.5                23
8  Lebron James    1         44         12            79            34              61.5                23
9  Lebron James    2         44         12            79            34              61.5                23
10 Lebron James    3         44         12            79            34              61.5                23
11 Lebron James    4         44         12            79            34              61.5                23
12 Lebron James    5         44         12            79            34              61.5                23
13  Steph Curry    0         45         76            18            42              31.5                59

Answer 3

在基础 R 中：

cols_used <- names(df[, -(1:2)])
f <- sub("[^0-9]+", 'ScoreDistance', cols_used)    
data.frame(lapply(split.default(df[cols_used], f), rowMeans))

  ScoreDistance0 ScoreDistance5
1            22.0             20
2            22.0             20
3            22.0             20
4            22.0             20
5            22.0             20
6            22.0             20
7            61.5             23
8            61.5             23
9            61.5             23
10           61.5             23
11           61.5             23
12           61.5             23
13           31.5             59
14           31.5             59
15           31.5             59
16           31.5             59
17           31.5             59
18           31.5             59
19           82.0             43
20           82.0             43
21           82.0             43
22           82.0             43
23           82.0             43
24           82.0             43

使用 tidyverse：

如何根据列名将功能应用于特定列？

问题描述

3 个解决方案

解决方案1
2 2022-06-01 21:23:21

解决方案2
1 2022-06-01 21:31:04

解决方案3
1 2022-06-01 21:45:18

如何根据列名将功能应用于特定列？

问题描述

3 个解决方案

解决方案1 2 2022-06-01 21:23:21

解决方案2 1 2022-06-01 21:31:04

解决方案3 1 2022-06-01 21:45:18

解决方案1
2 2022-06-01 21:23:21

解决方案2
1 2022-06-01 21:31:04

解决方案3
1 2022-06-01 21:45:18