![](/img/trans.png)
[英]R sum rows of a data.frame, that contain only numbers in a certain column
[英]Finding sum of data frame column in rows that contain certain value in R
我正在從事 March Madness 項目。 我有一個數據框df.A
每個球隊和賽季。 例如:
Season Team Name Code
2003 Creighton 2003-1166
2003 Notre Dame 2003-1323
2003 Arizona 2003-1112
另一個數據框df.B
包含每個賽季每場比賽的比賽結果:
WTeamScore LTeamScore WTeamCode LTeamCode
15 10 2003-1166 2003-1323
20 15 2003-1323 2003-1112
10 5 2003-1112 2003-1166
我正在嘗試在df.A
中獲得一個列,該列總計輸贏點數。 基本上:
Season Team Name Code Points
2003 Creighton 2003-1166 20
2003 Notre Dame 2003-1323 30
2003 Arizona 2003-1112 25
每個數據框中顯然還有數千行,但這是一般的想法。 解決這個問題的最佳方法是什么?
這是另一個使用tidyverse
的選項,我們可以將 pivot df.B
為長格式,然后獲取每個團隊的總和,然后返回df.A
。
library(tidyverse)
df.B %>%
pivot_longer(everything(),names_pattern = "(WTeam|LTeam)(.*)",
names_to = c("rep", ".value")) %>%
group_by(Code) %>%
summarise(Points = sum(Score)) %>%
left_join(df.A, ., by = "Code")
Output
Season Team.Name Code Points
1 2003 Creighton 2003-1166 20
2 2003 Notre Dame 2003-1323 30
3 2003 Arizona 2003-1112 25
數據
df.A <- structure(list(Season = c(2003L, 2003L, 2003L), Team.Name = c("Creighton",
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323",
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))
df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L,
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), class = "data.frame", row.names = c(NA,
-3L))
我們可以使用'df.A'上的'Code'到df.B中的'WTeamCode','LTeamCode'之間的match
(來自base R
)來獲取匹配索引,提取相應的'Score'列並獲得總和( +
)
df.A$Points <- with(df.A, df.B$WTeamScore[match(Code,
df.B$WTeamCode)] +
df.B$LTeamScore[match(Code, df.B$LTeamCode)])
-輸出
> df.A
Season TeamName Code Points
1 2003 Creighton 2003-1166 20
2 2003 Notre Dame 2003-1323 30
3 2003 Arizona 2003-1112 25
如果不匹配導致match
缺失值 ( NA
), cbind
向量以創建matrix
並使用rowSums
with na.rm = TRUE
df.A$Points <- with(df.A, rowSums(cbind(df.B$WTeamScore[match(Code,
df.B$WTeamCode)],
df.B$LTeamScore[match(Code, df.B$LTeamCode)]), na.rm = TRUE))
df.A <- structure(list(Season = c(2003L, 2003L, 2003L), TeamName = c("Creighton",
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323",
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))
df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L,
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")),
class = "data.frame", row.names = c(NA,
-3L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.