dplyr中的COUNTIF个等效项汇总

Question

I have a data frame listing total students (Stu) and number of students per group (ID) who are taking part in an activity (Sub): 我有一个数据框，其中列出了参加活动的学生总数（Stu）和每组的学生人数（ID）（Sub）：

     ID   Stu   Sub
  (int) (int) (int)
1   101    80    NA
2   102   130    NA
3   103    10    NA
4   104   210    20
5   105   180    NA
6   106   150    NA

I would like to know the number of groups in size bands (>400, >200, >100, >0) who are either involved in an activity (Sub > 0), or not (Sub is.na) 我想知道参与活动（Sub> 0）或不参与（Sub is.na）的大小带（> 400，> 200，> 100，> 0）中的组数

output <- structure(list(ID = c(101L, 102L, 103L, 104L, 105L, 106L), 
                       Stu = c(80L, 130L, 10L, 210L, 180L, 150L), 
                       Sub = c(NA,NA, NA, 20L, NA, NA)), 
                  .Names = c("ID", "Stu", "Sub"), 
                  class = c("tbl_df", "data.frame"), 
                  row.names = c(NA, -6L))

temp <- output %>% 
mutate(Stu = ifelse(Stu >= 400, 400,
         ifelse(Stu >= 200, 200,
             ifelse(Stu >= 100, 100, 0
                 )))) %>%
group_by(Stu) %>%
summarise(entries = length(!is.na(Sub)),
          noentries = length(is.na(Sub)))

The results should be: 结果应为：

    Stu entries noentries
  (dbl)   (int)     (int)
1     0       0         2
2   100       0         3
3   200       1         0

But I get: 但是我得到：

    Stu entries noentries
  (dbl)   (int)     (int)
1     0       2         2
2   100       3         3
3   200       1         1

How can I make the length function in the summarise act like a countif? 如何使摘要中的长度函数像Countif一样起作用？

Answer 1

summarise expects a single value, so sum instead of length does the job: summarise需要一个值，因此sum而不是length可以完成工作：

output %>% 
  mutate(Stu = ifelse(Stu >= 400, 400,
                      ifelse(Stu >= 200, 200,
                             ifelse(Stu >= 100, 100, 0
                             )))) %>%
  group_by(Stu) %>% 
  summarise(entries = sum(!is.na(Sub)),
            noentries = sum(is.na(Sub)))

Source: local data frame [3 x 3]

Stu entries noentries
(dbl)   (int)     (int)
1     0       0         2
2   100       0         3
3   200       1         0

Answer 2

Following the same idea provided by @eipi10, but cutting to the chase with count() instead of group_by() %>% tally() and showing that tidyr::spread can mimic reshape2::dcast : 遵循@ eipi10提供的相同想法，但是使用count()代替group_by() %>% tally()进行tidyr::spread并显示tidyr::spread可以模仿reshape2::dcast ：

output %>%
  count(Sub = ifelse(is.na(Sub), 'No Entries', 'Entires'),
        Stu = cut(Stu, c(0, 100, 200, 400, +Inf), labels = c(0, 100, 200, 400))) %>%
  tidyr::spread(Sub, n, fill = 0)

Answer 3

Another option is to group by both Stu and Sub , but to do that we need to first recode the values of Sub and Stu to match the output groupings we want. 另一个选择是对Stu和Sub ，但是要做到这一点，我们需要首先重新编码Sub和Stu的值以匹配我们想要的输出分组。 We also use cut , instead of nested ifelse , to set the value breaks in Stu : 我们还使用cut而不是嵌套的ifelse来设置Stu的值中断：

library(reshape2)

output %>% 
  group_by(Sub=ifelse(is.na(Sub), "No Entries", "Entries"),
           Stu=cut(Stu, c(0,100,200,400,Inf), labels=c(0,100,200,400))) %>%
  tally %>%
  dcast(Stu ~ Sub, fill=0)

  Stu Entries No Entries 1 0 0 2 2 100 0 3 3 200 1 0

dplyr中的COUNTIF个等效项汇总

问题描述

3 个解决方案

解决方案1
3 已采纳 2016-05-22 18:17:46

解决方案2
3 2016-05-22 19:12:33

解决方案3
1 2016-05-22 18:47:12

dplyr中的COUNTIF个等效项汇总

问题描述

3 个解决方案

解决方案1 3 已采纳 2016-05-22 18:17:46

解决方案2 3 2016-05-22 19:12:33

解决方案3 1 2016-05-22 18:47:12

解决方案1
3 已采纳 2016-05-22 18:17:46

解决方案2
3 2016-05-22 19:12:33

解决方案3
1 2016-05-22 18:47:12