简体   繁体   English

如何使用dplyr汇总分组数据中的级别数(nlevels)?

[英]How can I summarise numbers of levels (nlevels) in grouped data using dplyr?

I would like to use the summarise function in dplyr to extract the number of levels for each variable in my data frame, after grouping. 分组后,我想使用dplyr中的summary函数为数据框中的每个变量提取级别数。 Here is a replicate of the data frame: 这是数据帧的副本:

x=c("A","A","A","A","A","B","B","B","B","C","C","C","D","D","D","E","E")
y=c("a","b","c","a","b","a","b","c","d","c","b","e","b","d","f","a","b")
z=c("x","x","x","y","y","p","p","p","p","t","v","v","m","m","n","o","o")
d=data.frame(x,y,z)

Here is the code I am using 这是我正在使用的代码

   library(dplyr)
   d %>%
   group_by(x) %>%
   summarise(total=n(),
          Y=nlevels(y),
          Z=nlevels(z))

But, this generates Y and Z columns that summarise the levels in the data frame 'd' rather than in the grouped data. 但是,这会生成Y和Z列,这些列汇总了数据帧d中而不是分组数据中的级别。

The data frame I would like to generate would look like this: 我想生成的数据帧如下所示:

 x=c("A","B","C","D","E")
 total=c(5,4,3,3,2)
 Y=c(3,4,3,3,2)
 Z=c(2,1,2,2,1)
 d2=data.frame(x,total,Y,Z)
 d2

Thank you! 谢谢!

You need n_distinct for that: 为此,您需要n_distinct

d %>%
  group_by(x) %>%
  summarise(total = n(),
            Y = n_distinct(y),
            Z = n_distinct(z))

The result: 结果:

# A tibble: 5 x 4
       x total     Y     Z
  <fctr> <int> <int> <int>
1      A     5     3     2
2      B     4     4     1
3      C     3     3     2
4      D     3     3     2
5      E     2     2     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 dplyr 在 R 中扩展分组数据? - How can I expand grouped data in R using dplyr? 对带有分组数据的 purrr map() 中的特定列使用 dplyr summarise() - Using dplyr summarise() for specific columns within purrr map() with grouped data 如何应用自定义函数并通过dplyr :: summarize引用分组数据本身 - How to apply a custom function and quote the grouped data itself by dplyr::summarise 如何使用broom和dplyr将分组数据应用于分组模型? - How can I apply grouped data to grouped models using broom and dplyr? R-使用nlevels()ddply摘要不起作用 - R - ddply summarise using nlevels() does not work 如何使用 dplyr 将跨因子水平的分组计数保存到新变量中? - How can I save grouped counts across factor levels into a new variable with dplyr? 如何使用 Dplyr 检查 R 中的变量分组的 2 个数据帧的两列的值是否存在差异? - How can I check the values of two columns of 2 data frames for discrepancies grouped by a variable in R using Dplyr? 使用 dplyr 如何将分组数据除以与该相关联的特定值 - Using dplyr how can I divide grouped data by a specific value associated with that 在R中使用dplyr:如何在具有不同条件的同一列上汇总数据 - Using dplyr in R: How to summarise data on same column with different criteria 如何在dplyr中使用summarise_each进行关联? - How can I use summarise_each for correlations in dplyr?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM