![](/img/trans.png)
[英]How to “stack” and “unstack” in R, for summary statistics with 2 factors
[英]Summary Statistics for Multiple Factors in R?
對於每款游戲,我都有如下所示的游戲數據:
ID | 位置 | 團隊 | 歐普 | 分數 |
---|---|---|---|---|
0 | 一個 | 鴨子 | 青蛙 | 2 |
1 | 乙 | 鴨子 | 青蛙 | 15 |
2 | 乙 | 鴨子 | 青蛙 | 20 |
3 | C | 鴨子 | 青蛙 | 7 |
4 | C | 鴨子 | 青蛙 | 9.5 |
5 | C | 鴨子 | 青蛙 | 10 |
6 | 一個 | 青蛙 | 鴨子 | 3 |
7 | 一個 | 青蛙 | 鴨子 | 0.5 |
8 | 乙 | 青蛙 | 鴨子 | 17 |
9 | 乙 | 青蛙 | 鴨子 | 13 |
10 | 乙 | 青蛙 | 鴨子 | 21 |
11 | C | 青蛙 | 鴨子 | 8.5 |
我想獲得每個 position 和團隊(以及他們的對手)的平均分數,所以結果看起來像這樣。
ID | 位置 | 團隊 | 歐普 | 分數 | Team_A_Avg | Opp_A_Avg | Team_B_Avg | Opp_B_Avg | Team_C_Avg | Opp_C_Avg |
---|---|---|---|---|---|---|---|---|---|---|
0 | 一個 | 鴨子 | 青蛙 | 2 | 2 | 1.75 | 17.5 | 17 | 8.8333 | 8.5 |
1 | 乙 | 鴨子 | 青蛙 | 15 | 2 | 1.75 | 17.5 | 17 | 8.8333 | 8.5 |
2 | 乙 | 鴨子 | 青蛙 | 20 | 2 | 1.75 | 17.5 | 17 | 8.8333 | 8.5 |
3 | C | 鴨子 | 青蛙 | 7 | 2 | 1.75 | 17.5 | 17 | 8.8333 | 8.5 |
4 | C | 鴨子 | 青蛙 | 9.5 | 2 | 1.75 | 17.5 | 17 | 8.8333 | 8.5 |
5 | C | 鴨子 | 青蛙 | 10 | 2 | 1.75 | 17.5 | 17 | 8.8333 | 8.5 |
6 | 一個 | 青蛙 | 鴨子 | 3 | 1.75 | 2 | 17 | 17.5 | 8.5 | 8.8333 |
7 | 一個 | 青蛙 | 鴨子 | 0.5 | 1.75 | 2 | 17 | 17.5 | 8.5 | 8.8333 |
8 | 乙 | 青蛙 | 鴨子 | 17 | 1.75 | 2 | 17 | 17.5 | 8.5 | 8.8333 |
9 | 乙 | 青蛙 | 鴨子 | 13 | 1.75 | 2 | 17 | 17.5 | 8.5 | 8.8333 |
10 | 乙 | 青蛙 | 鴨子 | 21 | 1.75 | 2 | 17 | 17.5 | 8.5 | 8.8333 |
11 | C | 青蛙 | 鴨子 | 8.5 | 1.75 | 2 | 17 | 17.5 | 8.5 | 8.8333 |
go 關於這個問題的最佳方法是什么?
正如@Maurits Evers 所評論的那樣,您展示 output 的方式並沒有真正的意義。 似乎您想要一個單獨的 output,每個團隊的平均得分和 position。 另外,你只給了我們每一行的分數,我認為這是Team
的分數,所以我們沒有對手的分數來計算平均值。 我會使用dplyr
summarise
function。
這是您的數據:
game = data.frame(id = c(0:11),
Pos = c("A", "B", "B", "C", "C", "C","A","A", "B", "B", "B","C"),
Team = c("Duck","Duck","Duck","Duck","Duck","Duck","Frog","Frog","Frog","Frog","Frog","Frog"),
Opp = c("Frog","Frog","Frog","Frog","Frog","Frog","Duck","Duck","Duck","Duck","Duck","Duck"),
Score = c(2, 15, 20, 7, 9.5, 10, 3, 0.5, 17, 13, 21, 8.5))
首先是 position 的平均值:
library(dplyr)
Pos_av = game%>% #creat a new data.frame called "Pos_av" which is taking data from "game" and piping it (%>%) into different functions
group_by(Pos)%>% #first into a grouping function so we chose the variable we want to find the average for
summarise(Pos_Mean = mean(Score)) # the we pipe into summarise function where we name our new variable (Pos_Mean) and then define the function we want to use to summarise it (in this case the mean)
然后對於團隊來說也是一樣的意思:
Team_av = game%>%
group_by(Team)%>%
summarise(Team_Mean = mean(Score))
要獲得每個團隊和 position 的平均值,請按兩個變量分組:
Both_av = game%>%
group_by(Team, Pos)%>%
summarise(Mean = mean(Score))
您可以通過循環遍歷 dataframe 和所有條件來設置每個單元格的值,具體取決於此條件和此對手/團隊的平均值:
## The name of the variable holding the data.frame is "df"
## Expand the dataframe to contain your desired variables
for(t in c("Team","Opp")){
for(p in c("A","B","C")){
df[[paste(t,"_",p,"_","Avg",sep="")]]=NA
}
}
## Loop through the data to compute the means
for(i in 1:dim(df)[1]){
for(t in c("Team","Opp")){
for(p in c("A","B","C")){
## For each case i, each Team t, and each Position p, compute the mean and store it:
df[[paste(t,"_",p,"_","Avg",sep="")]][i] = mean(df$Score[df$Team==df[[t]][i] & df$Pos==p])
}
}
}
這導致 dataframe:
> df
Id Pos Team Opp Score Team_A_Avg Team_B_Avg Team_C_Avg Opp_A_Avg Opp_B_Avg Opp_C_Avg
1 0 A Duck Frog 2.0 2.00 17.5 8.833333 1.75 17.0 8.500000
2 1 B Duck Frog 15.0 2.00 17.5 8.833333 1.75 17.0 8.500000
3 2 B Duck Frog 20.0 2.00 17.5 8.833333 1.75 17.0 8.500000
4 3 C Duck Frog 7.0 2.00 17.5 8.833333 1.75 17.0 8.500000
5 4 C Duck Frog 9.5 2.00 17.5 8.833333 1.75 17.0 8.500000
6 5 C Duck Frog 10.0 2.00 17.5 8.833333 1.75 17.0 8.500000
7 6 A Frog Duck 3.0 1.75 17.0 8.500000 2.00 17.5 8.833333
8 7 A Frog Duck 0.5 1.75 17.0 8.500000 2.00 17.5 8.833333
9 8 B Frog Duck 17.0 1.75 17.0 8.500000 2.00 17.5 8.833333
10 9 B Frog Duck 13.0 1.75 17.0 8.500000 2.00 17.5 8.833333
11 10 B Frog Duck 21.0 1.75 17.0 8.500000 2.00 17.5 8.833333
12 11 C Frog Duck 8.5 1.75 17.0 8.500000 2.00 17.5 8.833333
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.