简体   繁体   English

如何汇总R中分类变量的唯一值的计数

[英]How to aggregate count of unique values of categorical variables in R

Suppose I have a data set data : 假设我有一个数据集data

x1 <- c("a","a","a","a","a","a","b","b","b","b")
x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2")
data <- data.frame(x1,x2)

x1 x2
a  a1
a  a1 
a  a2
a  a1
a  a2
a  a3
b  b1
b  b1
b  b2 
b  b2

I want to find the number of unique values of x1 corresponding to x2 我想找到对应于x2x1的唯一值的数量

For example a has only 3 unique values ( a1,a2 and a3 ) and b has 2 values ( b1 and b2 ) 例如, a只有3个唯一值( a1,a2a3 ), b有2个值( b1b2

I used aggregate(x1~.,data,sum) but it did not work since these are factors, not integers. 我使用了aggregate(x1~.,data,sum)但它不起作用,因为这些是因素,而不是整数。

Please help 请帮忙

Try 尝试

 aggregate(x2~x1, data, FUN=function(x) length(unique(x)))
 #  x1 x2
 #1  a  3
 #2  b  2

Or 要么

 rowSums(table(unique(data)))

Or 要么

library(dplyr)
data %>% 
     group_by(x1) %>%
     summarise(n=n_distinct(x2))

Or another option using dplyr suggested by @Eric 或者使用dplyr建议的dplyr的另一个选项

count(distinct(data), x1)

Or 要么

library(data.table)
setDT(data)[, uniqueN(x2) , x1]

Update 更新

If you need both the unique values of 'x2' and the count 如果您需要'x2'的unique值和计数

setDT(data)[, list(n=uniqueN(x2), x2=unique(x2)) , x1]

Or only the unique values 或者只有unique

setDT(data)[, list(x2=unique(x2)) , x1]

Or using dplyr 或者使用dplyr

 unique(data, by=x1) %>% 
                   group_by(x1) %>%
                   mutate(n=n_distinct(x2))

only for unique values 仅适用于唯一值

unique(data, by=x1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM