[英]R extracting the frequencies
I am trying to get the frequencies but my ids are repeating. 我正在尝试获取频率,但我的ID正在重复。 Here is a sample data: 这是一个示例数据:
id <- c(1,1,2,2,3,3)
gender <- c("m","m","f","f","m","m")
score <- c(10,5,10,5,10,5)
data <- data.frame("id"=id,"gender"=gender, "score"=score)
> data
id gender score
1 1 m 10
2 1 m 5
3 2 f 10
4 2 f 5
5 3 m 10
6 3 m 5
I would like to get the frequencies of the gender categories but I have repeating ids. 我想获得性别类别的频率,但是我重复输入ID。 When I run this code below: 当我在下面运行此代码时:
gender<-as.data.frame(table(data$gender))
> gender
Var1 Freq
1 f 2
2 m 4
The frequency should be female = 1, male =2. 频率应为女性= 1,男性= 2。 it should look like this below: 它应如下所示:
> gender
Var1 Freq
1 f 1
2 m 2
How can I get this considering the id information? 考虑到ID信息,如何获得此信息?
You can use data.table::uniqueN
to count the number of unique ids per gender group 您可以使用data.table::uniqueN
来计算每个性别组的唯一ID数量
library(data.table)
setDT(data)
data[, .(Freq = uniqueN(id)), gender]
# gender Freq
# 1: m 2
# 2: f 1
The idea from @IceCreamToucan with dplyr
: @IceCreamToucan和dplyr
的想法:
data %>%
group_by(gender) %>%
summarise(freq = n_distinct(id))
gender freq
<fct> <int>
1 f 1
2 m 2
In base R 在基数R中
rowSums(table(data$gender,data$id)!=0)
f m
1 2
Being late to the party, I was quite surprised about the sophisticated answers which use grouping or rowSums()
. 晚会迟到了,我对使用grouping或rowSums()
的复杂答案感到惊讶。
In base R, I would 在基数R中,我会
id
rows from the data.frame by subsetting with duplicated(id)
, 通过使用duplicated(id)
子集从data.frame中删除重复的id
行, table()
on the gender
column. 在gender
列上应用table()
。 So, the code is 所以,代码是
table(data[duplicated(data$id), "gender"])
fm 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.