在 R 中查找包含特定值的行中数据框列的总和

Question

I'm working on a March Madness project.我正在从事 March Madness 项目。 I have a data frame df.A with every team and season.我有一个数据框df.A每个球队和赛季。 For example:例如：

Season   Team Name   Code
  2003   Creighton   2003-1166
  2003   Notre Dame  2003-1323
  2003   Arizona     2003-1112

And another data frame df.B with game results of of every game every season:另一个数据框df.B包含每个赛季每场比赛的比赛结果：

WTeamScore  LTeamScore  WTeamCode  LTeamCode
15          10          2003-1166  2003-1323
20          15          2003-1323  2003-1112
10          5           2003-1112  2003-1166

I'm trying to get a column in df.A that totals the number of points in both wins and losses.我正在尝试在df.A中获得一个列，该列总计输赢点数。 Basically:基本上：

Season   Team Name   Code        Points
  2003   Creighton   2003-1166   20
  2003   Notre Dame  2003-1323   30
  2003   Arizona     2003-1112   25

There are obviously thousands more rows in each data frame, but this is the general idea.每个数据框中显然还有数千行，但这是一般的想法。 What would be the best way of going about this?解决这个问题的最佳方法是什么？

Answer 1

Here is another option using tidyverse , where we can pivot df.B to long form, then get the sum for each team, then join back to df.A .这是另一个使用tidyverse的选项，我们可以将 pivot df.B为长格式，然后获取每个团队的总和，然后返回df.A 。

library(tidyverse)

df.B %>%
  pivot_longer(everything(),names_pattern = "(WTeam|LTeam)(.*)",
               names_to = c("rep", ".value")) %>% 
  group_by(Code) %>% 
  summarise(Points = sum(Score)) %>% 
  left_join(df.A, ., by = "Code")

Output Output

  Season  Team.Name      Code Points
1   2003  Creighton 2003-1166     20
2   2003 Notre Dame 2003-1323     30
3   2003    Arizona 2003-1112     25

Data数据

df.A <- structure(list(Season = c(2003L, 2003L, 2003L), Team.Name = c("Creighton", 
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323", 
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))

df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L, 
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), class = "data.frame", row.names = c(NA, 
-3L))

Answer 2

We may use match (from base R ) between 'Code' on 'df.A' to 'WTeamCode', 'LTeamCode' in df.B to get the matching index, to extract the corresponding 'Score' columns and get the sum ( + )我们可以使用'df.A'上的'Code'到df.B中的'WTeamCode'，'LTeamCode'之间的match （来自base R ）来获取匹配索引，提取相应的'Score'列并获得总和（ + )

df.A$Points <- with(df.A, df.B$WTeamScore[match(Code, 
       df.B$WTeamCode)] + 
       df.B$LTeamScore[match(Code, df.B$LTeamCode)])

-output -输出

> df.A
  Season   TeamName      Code Points
1   2003  Creighton 2003-1166     20
2   2003 Notre Dame 2003-1323     30
3   2003    Arizona 2003-1112     25

If there are nonmatches resulting in missing values ( NA ) from match , cbind the vectors to create a matrix and use rowSums with na.rm = TRUE如果不匹配导致match缺失值 ( NA )， cbind向量以创建matrix并使用rowSums with na.rm = TRUE

df.A$Points <- with(df.A, rowSums(cbind(df.B$WTeamScore[match(Code, 
   df.B$WTeamCode)],  
     df.B$LTeamScore[match(Code, df.B$LTeamCode)]), na.rm = TRUE))

data数据

df.A <- structure(list(Season = c(2003L, 2003L, 2003L), TeamName = c("Creighton", 
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323", 
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))

df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L, 
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), 
class = "data.frame", row.names = c(NA, 
-3L))

在 R 中查找包含特定值的行中数据框列的总和

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-03-12 21:20:47

解决方案2
1 2022-03-12 20:17:25

data数据

在 R 中查找包含特定值的行中数据框列的总和

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-03-12 21:20:47

解决方案2 1 2022-03-12 20:17:25

data数据

解决方案1
3 已采纳 2022-03-12 21:20:47

解决方案2
1 2022-03-12 20:17:25