[英]COUNTIF equivalent in dplyr summarise
I have a data frame listing total students (Stu) and number of students per group (ID) who are taking part in an activity (Sub): 我有一个数据框,其中列出了参加活动的学生总数(Stu)和每组的学生人数(ID)(Sub):
ID Stu Sub
(int) (int) (int)
1 101 80 NA
2 102 130 NA
3 103 10 NA
4 104 210 20
5 105 180 NA
6 106 150 NA
I would like to know the number of groups in size bands (>400, >200, >100, >0) who are either involved in an activity (Sub > 0), or not (Sub is.na) 我想知道参与活动(Sub> 0)或不参与(Sub is.na)的大小带(> 400,> 200,> 100,> 0)中的组数
output <- structure(list(ID = c(101L, 102L, 103L, 104L, 105L, 106L),
Stu = c(80L, 130L, 10L, 210L, 180L, 150L),
Sub = c(NA,NA, NA, 20L, NA, NA)),
.Names = c("ID", "Stu", "Sub"),
class = c("tbl_df", "data.frame"),
row.names = c(NA, -6L))
temp <- output %>%
mutate(Stu = ifelse(Stu >= 400, 400,
ifelse(Stu >= 200, 200,
ifelse(Stu >= 100, 100, 0
)))) %>%
group_by(Stu) %>%
summarise(entries = length(!is.na(Sub)),
noentries = length(is.na(Sub)))
The results should be: 结果应为:
Stu entries noentries
(dbl) (int) (int)
1 0 0 2
2 100 0 3
3 200 1 0
But I get: 但是我得到:
Stu entries noentries
(dbl) (int) (int)
1 0 2 2
2 100 3 3
3 200 1 1
How can I make the length function in the summarise act like a countif? 如何使摘要中的长度函数像Countif一样起作用?
summarise
expects a single value, so sum
instead of length
does the job: summarise
需要一个值,因此sum
而不是length
可以完成工作:
output %>%
mutate(Stu = ifelse(Stu >= 400, 400,
ifelse(Stu >= 200, 200,
ifelse(Stu >= 100, 100, 0
)))) %>%
group_by(Stu) %>%
summarise(entries = sum(!is.na(Sub)),
noentries = sum(is.na(Sub)))
Source: local data frame [3 x 3]
Stu entries noentries
(dbl) (int) (int)
1 0 0 2
2 100 0 3
3 200 1 0
Following the same idea provided by @eipi10, but cutting to the chase with count()
instead of group_by() %>% tally()
and showing that tidyr::spread
can mimic reshape2::dcast
: 遵循@ eipi10提供的相同想法,但是使用count()
代替group_by() %>% tally()
进行tidyr::spread
并显示tidyr::spread
可以模仿reshape2::dcast
:
output %>%
count(Sub = ifelse(is.na(Sub), 'No Entries', 'Entires'),
Stu = cut(Stu, c(0, 100, 200, 400, +Inf), labels = c(0, 100, 200, 400))) %>%
tidyr::spread(Sub, n, fill = 0)
Another option is to group by both Stu
and Sub
, but to do that we need to first recode the values of Sub
and Stu
to match the output groupings we want. 另一个选择是对Stu
和Sub
,但是要做到这一点,我们需要首先重新编码Sub
和Stu
的值以匹配我们想要的输出分组。 We also use cut
, instead of nested ifelse
, to set the value breaks in Stu
: 我们还使用cut
而不是嵌套的ifelse
来设置Stu
的值中断:
library(reshape2)
output %>%
group_by(Sub=ifelse(is.na(Sub), "No Entries", "Entries"),
Stu=cut(Stu, c(0,100,200,400,Inf), labels=c(0,100,200,400))) %>%
tally %>%
dcast(Stu ~ Sub, fill=0)
Stu Entries No Entries 1 0 0 2 2 100 0 3 3 200 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.