[英]How to aggregate count of unique values of categorical variables in R
Suppose I have a data set data
: 假设我有一个数据集
data
:
x1 <- c("a","a","a","a","a","a","b","b","b","b")
x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2")
data <- data.frame(x1,x2)
x1 x2
a a1
a a1
a a2
a a1
a a2
a a3
b b1
b b1
b b2
b b2
I want to find the number of unique values of x1
corresponding to x2
我想找到对应于
x2
的x1
的唯一值的数量
For example a
has only 3 unique values ( a1,a2
and a3
) and b
has 2 values ( b1
and b2
) 例如,
a
只有3个唯一值( a1,a2
和a3
), b
有2个值( b1
和b2
)
I used aggregate(x1~.,data,sum)
but it did not work since these are factors, not integers. 我使用了
aggregate(x1~.,data,sum)
但它不起作用,因为这些是因素,而不是整数。
Please help 请帮忙
Try 尝试
aggregate(x2~x1, data, FUN=function(x) length(unique(x)))
# x1 x2
#1 a 3
#2 b 2
Or 要么
rowSums(table(unique(data)))
Or 要么
library(dplyr)
data %>%
group_by(x1) %>%
summarise(n=n_distinct(x2))
Or another option using dplyr
suggested by @Eric 或者使用
dplyr
建议的dplyr的另一个选项
count(distinct(data), x1)
Or 要么
library(data.table)
setDT(data)[, uniqueN(x2) , x1]
If you need both the unique
values of 'x2' and the count 如果您需要'x2'的
unique
值和计数
setDT(data)[, list(n=uniqueN(x2), x2=unique(x2)) , x1]
Or only the unique
values 或者只有
unique
值
setDT(data)[, list(x2=unique(x2)) , x1]
Or using dplyr
或者使用
dplyr
unique(data, by=x1) %>%
group_by(x1) %>%
mutate(n=n_distinct(x2))
only for unique values 仅适用于唯一值
unique(data, by=x1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.