I would like to use the summarise function in dplyr to extract the number of levels for each variable in my data frame, after grouping. Here is a replicate of the data frame:
x=c("A","A","A","A","A","B","B","B","B","C","C","C","D","D","D","E","E")
y=c("a","b","c","a","b","a","b","c","d","c","b","e","b","d","f","a","b")
z=c("x","x","x","y","y","p","p","p","p","t","v","v","m","m","n","o","o")
d=data.frame(x,y,z)
Here is the code I am using
library(dplyr)
d %>%
group_by(x) %>%
summarise(total=n(),
Y=nlevels(y),
Z=nlevels(z))
But, this generates Y and Z columns that summarise the levels in the data frame 'd' rather than in the grouped data.
The data frame I would like to generate would look like this:
x=c("A","B","C","D","E")
total=c(5,4,3,3,2)
Y=c(3,4,3,3,2)
Z=c(2,1,2,2,1)
d2=data.frame(x,total,Y,Z)
d2
Thank you!
You need n_distinct
for that:
d %>%
group_by(x) %>%
summarise(total = n(),
Y = n_distinct(y),
Z = n_distinct(z))
The result:
# A tibble: 5 x 4
x total Y Z
<fctr> <int> <int> <int>
1 A 5 3 2
2 B 4 4 1
3 C 3 3 2
4 D 3 3 2
5 E 2 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.