[英]Find first occurence of value in group using dplyr mutate
How do i find the first occurence of a certain value, within a group using dplyr
. 如何在使用
dplyr
的组中找到特定值的第一个dplyr
。
The following code gives the desired result, but it I'm wondering if there is a shorter way to do it. 下面的代码给出了所需的结果,但我想知道是否有更短的方法来做到这一点。
Also, I am worried that group_by
or mutate
, or some other function might do implicit rearrangement of the rows, don't know if this could be an issue? 另外,我担心
group_by
或mutate
,或者其他一些函数可能会对行进行隐式重新排列,不知道这可能是个问题吗?
mtcars %>% select( cyl, carb) %>% group_by( cyl ) %>%
mutate( "occurence_of_4" = carb == 4 ) %>%
dplyr::arrange( cyl ) %>%
group_by( cyl, occurence_of_4) %>%
mutate( "count" = 1:n(),
"first_4_in_cyl_group" = ifelse( occurence_of_4==TRUE & count==1, TRUE, FALSE))
The variable first_4_in_cyl_group is TRUE
for the first occurence of "4" in each cylinder group, FALSE
otherwise: 对于每个柱面组中第一次出现“4”,变量first_4_in_cyl_group为
TRUE
,否则为FALSE
:
Source: local data frame [32 x 5]
Groups: cyl, occurence_of_4
cyl carb occurence_of_4 count first_4_in_cyl_group
1 4 1 FALSE 1 FALSE
2 4 2 FALSE 2 FALSE
3 4 2 FALSE 3 FALSE
4 4 1 FALSE 4 FALSE
5 4 2 FALSE 5 FALSE
6 4 1 FALSE 6 FALSE
7 4 1 FALSE 7 FALSE
8 4 1 FALSE 8 FALSE
9 4 2 FALSE 9 FALSE
10 4 2 FALSE 10 FALSE
11 4 2 FALSE 11 FALSE
12 6 4 TRUE 1 TRUE
13 6 4 TRUE 2 FALSE
14 6 1 FALSE 1 FALSE
15 6 1 FALSE 2 FALSE
16 6 4 TRUE 3 FALSE
17 6 4 TRUE 4 FALSE
18 6 6 FALSE 3 FALSE
19 8 2 FALSE 1 FALSE
20 8 4 TRUE 1 TRUE
21 8 3 FALSE 2 FALSE
22 8 3 FALSE 3 FALSE
23 8 3 FALSE 4 FALSE
24 8 4 TRUE 2 FALSE
25 8 4 TRUE 3 FALSE
26 8 4 TRUE 4 FALSE
27 8 2 FALSE 5 FALSE
28 8 2 FALSE 6 FALSE
29 8 4 TRUE 5 FALSE
30 8 2 FALSE 7 FALSE
31 8 4 TRUE 6 FALSE
32 8 8 FALSE 8 FALSE
You may use !duplicated
. 你可以使用
!duplicated
。
mtcars %>%
select(cyl, carb) %>%
group_by(cyl) %>%
mutate(first_4 = carb == 4 & !duplicated(carb == 4)) %>%
arrange(cyl)
A couple of modifications: 几个修改:
mutate
step by creating the "occurence_of_4" variable within the group_by
group_by
创建“occurence_of_4”变量来删除第一个mutate
步骤 ifelse
is not needed as the output will be "TRUE/FALSE" 因为输出将是“TRUE / FALSE”,所以不需要
ifelse
library(dplyr) mtcars %>% select(cyl, carb) %>% group_by(cyl, occurence_of_4= carb==4) %>% arrange(cyl) %>% mutate(count= row_number(), first_4_in_cyl_group = occurence_of_4 & count==1)
Instead of grouping it is enough to arrange by cyl and carb. 而不是分组它足以通过cyl和carb安排。 With lag you can check the previous value.
有了滞后,您可以检查以前的值。
mtcars database doesn't have an ID column, so if you are rearranging rows you could add them with add_rownames (as docendodiscimus suggested in comments). mtcars数据库没有ID列,因此如果要重新排列行,可以使用add_rownames添加它们(如注释中建议的docendodiscimus)。
mtcars %>%
select( cyl, carb ) %>%
add_rownames() %>%
arrange(cyl, carb) %>%
mutate(
isfirst = (carb == 4 & (is.na(lag(carb)) | lag(carb) != 4))) %>%
filter(isfirst)
The result: 结果:
# rowname cyl carb isfirst
# 1 Mazda RX4 6 4 TRUE
# 2 Duster 360 8 4 TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.