简体   繁体   English

在 R 中查找包含特定值的行中数据框列的总和

[英]Finding sum of data frame column in rows that contain certain value in R

I'm working on a March Madness project.我正在从事 March Madness 项目。 I have a data frame df.A with every team and season.我有一个数据框df.A每个球队和赛季。 For example:例如:

Season   Team Name   Code
  2003   Creighton   2003-1166
  2003   Notre Dame  2003-1323
  2003   Arizona     2003-1112

And another data frame df.B with game results of of every game every season:另一个数据框df.B包含每个赛季每场比赛的比赛结果:

WTeamScore  LTeamScore  WTeamCode  LTeamCode
15          10          2003-1166  2003-1323
20          15          2003-1323  2003-1112
10          5           2003-1112  2003-1166

I'm trying to get a column in df.A that totals the number of points in both wins and losses.我正在尝试在df.A中获得一个列,该列总计输赢点数。 Basically:基本上:

Season   Team Name   Code        Points
  2003   Creighton   2003-1166   20
  2003   Notre Dame  2003-1323   30
  2003   Arizona     2003-1112   25

There are obviously thousands more rows in each data frame, but this is the general idea.每个数据框中显然还有数千行,但这是一般的想法。 What would be the best way of going about this?解决这个问题的最佳方法是什么?

Here is another option using tidyverse , where we can pivot df.B to long form, then get the sum for each team, then join back to df.A .这是另一个使用tidyverse的选项,我们可以将 pivot df.B为长格式,然后获取每个团队的总和,然后返回df.A

library(tidyverse)

df.B %>%
  pivot_longer(everything(),names_pattern = "(WTeam|LTeam)(.*)",
               names_to = c("rep", ".value")) %>% 
  group_by(Code) %>% 
  summarise(Points = sum(Score)) %>% 
  left_join(df.A, ., by = "Code")

Output Output

  Season  Team.Name      Code Points
1   2003  Creighton 2003-1166     20
2   2003 Notre Dame 2003-1323     30
3   2003    Arizona 2003-1112     25

Data数据

df.A <- structure(list(Season = c(2003L, 2003L, 2003L), Team.Name = c("Creighton", 
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323", 
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))

df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L, 
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), class = "data.frame", row.names = c(NA, 
-3L))

We may use match (from base R ) between 'Code' on 'df.A' to 'WTeamCode', 'LTeamCode' in df.B to get the matching index, to extract the corresponding 'Score' columns and get the sum ( + )我们可以使用'df.A'上的'Code'到df.B中的'WTeamCode','LTeamCode'之间的match (来自base R )来获取匹配索引,提取相应的'Score'列并获得总和( + )

df.A$Points <- with(df.A, df.B$WTeamScore[match(Code, 
       df.B$WTeamCode)] + 
       df.B$LTeamScore[match(Code, df.B$LTeamCode)])

-output -输出

> df.A
  Season   TeamName      Code Points
1   2003  Creighton 2003-1166     20
2   2003 Notre Dame 2003-1323     30
3   2003    Arizona 2003-1112     25

If there are nonmatches resulting in missing values ( NA ) from match , cbind the vectors to create a matrix and use rowSums with na.rm = TRUE如果不匹配导致match缺失值 ( NA ), cbind向量以创建matrix并使用rowSums with na.rm = TRUE

df.A$Points <- with(df.A, rowSums(cbind(df.B$WTeamScore[match(Code, 
   df.B$WTeamCode)],  
     df.B$LTeamScore[match(Code, df.B$LTeamCode)]), na.rm = TRUE))

data数据

df.A <- structure(list(Season = c(2003L, 2003L, 2003L), TeamName = c("Creighton", 
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323", 
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))

df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L, 
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), 
class = "data.frame", row.names = c(NA, 
-3L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R个data.frame的总和行,在特定列中仅包含数字 - R sum rows of a data.frame, that contain only numbers in a certain column 仅查找一次包含某个字符的数据框行 - Finding data frame rows that contain a certain character only once R:如果行包含某个值(对于许多列),如何从数据框中删除行 - R: how to drop rows from a data frame if the rows contain a certain value (for many columns) 如果在 2 行中满足某些条件,如何在 R 数据框中添加新列,显示当前行和前一行中的值之和? - How to add new column in R data frame showing sum of a value in a current row and a prior row, if certain conditions are met in the 2 rows? 在 R 中,如何将数据帧的某些行与某些逻辑相加? - In R, how to sum certain rows of a data frame with certain logic? 选择仅包含特定列中的数字的data.frame行 - Select rows of a data.frame that contain only numbers in a certain column 如何计算R中数据帧的某些行的总和 - How to calculate the sum of certain rows of a data frame in R R:按特定列获取data.frame组中列的总和 - R : Getting the sum of columns in a data.frame group by a certain column R使用不包含单个值的单元格删除数据框中的行 - R removing rows in data frame with cells that do not contain a single value 查找列总和低于 R 中给定值的行 - Finding rows with sum of a column which is lower than a given value in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM