简体   繁体   English

R提取频率

[英]R extracting the frequencies

I am trying to get the frequencies but my ids are repeating. 我正在尝试获取频率,但我的ID正在重复。 Here is a sample data: 这是一个示例数据:

id <- c(1,1,2,2,3,3)
gender <- c("m","m","f","f","m","m")
score <- c(10,5,10,5,10,5)
data <- data.frame("id"=id,"gender"=gender, "score"=score)

> data
  id gender score
1  1      m    10
2  1      m     5
3  2      f    10
4  2      f     5
5  3      m    10
6  3      m     5

I would like to get the frequencies of the gender categories but I have repeating ids. 我想获得性别类别的频率,但是我重复输入ID。 When I run this code below: 当我在下面运行此代码时:

gender<-as.data.frame(table(data$gender))
> gender
  Var1 Freq
1    f    2
2    m    4

The frequency should be female = 1, male =2. 频率应为女性= 1,男性= 2。 it should look like this below: 它应如下所示:

> gender
  Var1 Freq
1    f    1
2    m    2

How can I get this considering the id information? 考虑到ID信息,如何获得此信息?

You can use data.table::uniqueN to count the number of unique ids per gender group 您可以使用data.table::uniqueN来计算每个性别组的唯一ID数量

library(data.table)
setDT(data)

data[, .(Freq = uniqueN(id)), gender]

#    gender Freq
# 1:      m    2
# 2:      f    1

The idea from @IceCreamToucan with dplyr : @IceCreamToucan和dplyr的想法:

data %>%
 group_by(gender) %>%
 summarise(freq = n_distinct(id))

  gender  freq
  <fct>  <int>
1 f          1
2 m          2

In base R 在基数R中

rowSums(table(data$gender,data$id)!=0)
f m 
1 2 

Being late to the party, I was quite surprised about the sophisticated answers which use grouping or rowSums() . 晚会迟到了,我对使用grouping或rowSums()的复杂答案感到惊讶。

In base R, I would 在基数R中,我会

  1. remove the duplicate id rows from the data.frame by subsetting with duplicated(id) , 通过使用duplicated(id)子集从data.frame中删除重复的id行,
  2. apply table() on the gender column. gender列上应用table()

So, the code is 所以,代码是

table(data[duplicated(data$id), "gender"])
 fm 1 2 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM