[英]Finding sum of data frame column in rows that contain certain value in R
I'm working on a March Madness project.我正在从事 March Madness 项目。 I have a data frame
df.A
with every team and season.我有一个数据框
df.A
每个球队和赛季。 For example:例如:
Season Team Name Code
2003 Creighton 2003-1166
2003 Notre Dame 2003-1323
2003 Arizona 2003-1112
And another data frame df.B
with game results of of every game every season:另一个数据框
df.B
包含每个赛季每场比赛的比赛结果:
WTeamScore LTeamScore WTeamCode LTeamCode
15 10 2003-1166 2003-1323
20 15 2003-1323 2003-1112
10 5 2003-1112 2003-1166
I'm trying to get a column in df.A
that totals the number of points in both wins and losses.我正在尝试在
df.A
中获得一个列,该列总计输赢点数。 Basically:基本上:
Season Team Name Code Points
2003 Creighton 2003-1166 20
2003 Notre Dame 2003-1323 30
2003 Arizona 2003-1112 25
There are obviously thousands more rows in each data frame, but this is the general idea.每个数据框中显然还有数千行,但这是一般的想法。 What would be the best way of going about this?
解决这个问题的最佳方法是什么?
Here is another option using tidyverse
, where we can pivot df.B
to long form, then get the sum for each team, then join back to df.A
.这是另一个使用
tidyverse
的选项,我们可以将 pivot df.B
为长格式,然后获取每个团队的总和,然后返回df.A
。
library(tidyverse)
df.B %>%
pivot_longer(everything(),names_pattern = "(WTeam|LTeam)(.*)",
names_to = c("rep", ".value")) %>%
group_by(Code) %>%
summarise(Points = sum(Score)) %>%
left_join(df.A, ., by = "Code")
Output Output
Season Team.Name Code Points
1 2003 Creighton 2003-1166 20
2 2003 Notre Dame 2003-1323 30
3 2003 Arizona 2003-1112 25
Data数据
df.A <- structure(list(Season = c(2003L, 2003L, 2003L), Team.Name = c("Creighton",
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323",
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))
df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L,
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), class = "data.frame", row.names = c(NA,
-3L))
We may use match
(from base R
) between 'Code' on 'df.A' to 'WTeamCode', 'LTeamCode' in df.B to get the matching index, to extract the corresponding 'Score' columns and get the sum ( +
)我们可以使用'df.A'上的'Code'到df.B中的'WTeamCode','LTeamCode'之间的
match
(来自base R
)来获取匹配索引,提取相应的'Score'列并获得总和( +
)
df.A$Points <- with(df.A, df.B$WTeamScore[match(Code,
df.B$WTeamCode)] +
df.B$LTeamScore[match(Code, df.B$LTeamCode)])
-output -输出
> df.A
Season TeamName Code Points
1 2003 Creighton 2003-1166 20
2 2003 Notre Dame 2003-1323 30
3 2003 Arizona 2003-1112 25
If there are nonmatches resulting in missing values ( NA
) from match
, cbind
the vectors to create a matrix
and use rowSums
with na.rm = TRUE
如果不匹配导致
match
缺失值 ( NA
), cbind
向量以创建matrix
并使用rowSums
with na.rm = TRUE
df.A$Points <- with(df.A, rowSums(cbind(df.B$WTeamScore[match(Code,
df.B$WTeamCode)],
df.B$LTeamScore[match(Code, df.B$LTeamCode)]), na.rm = TRUE))
df.A <- structure(list(Season = c(2003L, 2003L, 2003L), TeamName = c("Creighton",
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323",
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))
df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L,
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")),
class = "data.frame", row.names = c(NA,
-3L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.