[英]How do I get the sum of frequency count based on two columns?
Assuming that the dataframe is stored as someData
, and is in the following format:假设数据帧存储为
someData
,并采用以下格式:
ID Team Games Medal
1 Australia 1992 Summer NA
2 Australia 1994 Summer Gold
3 Australia 1992 Summer Silver
4 United States 1991 Winter Gold
5 United States 1992 Summer Bronze
6 Singapore 1991 Summer NA
How would I count the frequencies of the medal, based on the Team - while excluding NA
as an variable.我将如何根据团队计算奖牌的频率 - 同时将
NA
作为变量排除在外。 But at the same time, the total frequency of each country should be summed, rather than displayed separately for Gold
, Silver
and Bronze
.但同时,每个国家的总频率应该是相加的,而不是分别为
Gold
、 Silver
和Bronze
。
In other words, I am trying to display the total number of medals PER country, with the exception of NA
.换句话说,我试图显示每个国家的奖牌总数,但
NA
除外。
I have tried something like this:我试过这样的事情:
library(plyr)
counts <- ddply(olympics, .(olympics$Team, olympics$Medal), nrow)
names(counts) <- c("Country", "Medal", "Freq")
counts
But this just gives me a massive table of every medal for every country separately, including NA.但这只是给了我一个巨大的表格,列出了每个国家的每枚奖牌,包括北美。
What I would like to do is the following:我想做的是以下内容:
Australia 2
United States 2
Any help would be greatly appreciated.任何帮助将不胜感激。
Thank you!谢谢!
We can use count
我们可以使用
count
library(dplyr)
df1 %>%
filter(!is.na(Medal)) %>%
count(Team)
# A tibble: 2 x 2
# Team n
# <fct> <int>
#1 Australia 2
#2 United States 2
You can do that in base R with table
and colSums
你可以用
table
和colSums
在 base R 中做到这colSums
colSums(table(someData$Medal, someData$Team))
Australia Singapore United States
2 0 2
Data数据
someData = read.table(text="ID Team Games Medal
1 Australia '1992 Summer' NA
2 Australia '1994 Summer' Gold
3 Australia '1992 Summer' Silver
4 'United States' '1991 Winter' Gold
5 'United States' '1992 Summer' Bronze
6 Singapore '1991 Summer' NA",
header=TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.