使用dplyr mutate在组中查找第一次出现的值

Question

How do i find the first occurence of a certain value, within a group using dplyr . 如何在使用dplyr的组中找到特定值的第一个dplyr 。

The following code gives the desired result, but it I'm wondering if there is a shorter way to do it. 下面的代码给出了所需的结果，但我想知道是否有更短的方法来做到这一点。

Also, I am worried that group_by or mutate , or some other function might do implicit rearrangement of the rows, don't know if this could be an issue? 另外，我担心group_by或mutate ，或者其他一些函数可能会对行进行隐式重新排列，不知道这可能是个问题吗？

mtcars   %>% select( cyl, carb) %>% group_by( cyl ) %>%

   mutate( "occurence_of_4" =  carb == 4 )  %>%

   dplyr::arrange( cyl )  %>%

   group_by( cyl, occurence_of_4)  %>%

   mutate( "count" = 1:n(),
           "first_4_in_cyl_group"  = ifelse( occurence_of_4==TRUE & count==1, TRUE, FALSE))

The variable first_4_in_cyl_group is TRUE for the first occurence of "4" in each cylinder group, FALSE otherwise: 对于每个柱面组中第一次出现“4”，变量first_4_in_cyl_group为TRUE ，否则为FALSE ：

Source: local data frame [32 x 5]
Groups: cyl, occurence_of_4

   cyl carb occurence_of_4 count first_4_in_cyl_group
1    4    1          FALSE     1                FALSE
2    4    2          FALSE     2                FALSE
3    4    2          FALSE     3                FALSE
4    4    1          FALSE     4                FALSE
5    4    2          FALSE     5                FALSE
6    4    1          FALSE     6                FALSE
7    4    1          FALSE     7                FALSE
8    4    1          FALSE     8                FALSE
9    4    2          FALSE     9                FALSE
10   4    2          FALSE    10                FALSE
11   4    2          FALSE    11                FALSE
12   6    4           TRUE     1                 TRUE
13   6    4           TRUE     2                FALSE
14   6    1          FALSE     1                FALSE
15   6    1          FALSE     2                FALSE
16   6    4           TRUE     3                FALSE
17   6    4           TRUE     4                FALSE
18   6    6          FALSE     3                FALSE
19   8    2          FALSE     1                FALSE
20   8    4           TRUE     1                 TRUE
21   8    3          FALSE     2                FALSE
22   8    3          FALSE     3                FALSE
23   8    3          FALSE     4                FALSE
24   8    4           TRUE     2                FALSE
25   8    4           TRUE     3                FALSE
26   8    4           TRUE     4                FALSE
27   8    2          FALSE     5                FALSE
28   8    2          FALSE     6                FALSE
29   8    4           TRUE     5                FALSE
30   8    2          FALSE     7                FALSE
31   8    4           TRUE     6                FALSE
32   8    8          FALSE     8                FALSE

Answer 1

You may use !duplicated . 你可以使用!duplicated 。

mtcars %>%
  select(cyl, carb) %>%
  group_by(cyl) %>%
  mutate(first_4 = carb == 4 & !duplicated(carb == 4))  %>%
  arrange(cyl)

Answer 2

A couple of modifications: 几个修改：

Remove the first mutate step by creating the "occurence_of_4" variable within the group_by 通过在group_by创建“occurence_of_4”变量来删除第一个mutate步骤

ifelse is not needed as the output will be "TRUE/FALSE" 因为输出将是“TRUE / FALSE”，所以不需要ifelse

 library(dplyr) mtcars %>% select(cyl, carb) %>% group_by(cyl, occurence_of_4= carb==4) %>% arrange(cyl) %>% mutate(count= row_number(), first_4_in_cyl_group = occurence_of_4 & count==1)

Answer 3

Instead of grouping it is enough to arrange by cyl and carb. 而不是分组它足以通过cyl和carb安排。 With lag you can check the previous value. 有了滞后，您可以检查以前的值。

mtcars database doesn't have an ID column, so if you are rearranging rows you could add them with add_rownames (as docendodiscimus suggested in comments). mtcars数据库没有ID列，因此如果要重新排列行，可以使用add_rownames添加它们（如注释中建议的docendodiscimus）。

mtcars   %>% 
  select( cyl, carb ) %>%
  add_rownames() %>%
  arrange(cyl, carb) %>%
  mutate(
    isfirst = (carb == 4 & (is.na(lag(carb)) | lag(carb) != 4))) %>%
  filter(isfirst)

The result: 结果：

#      rowname cyl carb isfirst
# 1  Mazda RX4   6    4    TRUE
# 2 Duster 360   8    4    TRUE

使用dplyr mutate在组中查找第一次出现的值

问题描述

3 个解决方案

解决方案1
7 2015-03-26 09:25:28

解决方案2
3 已采纳 2015-03-26 07:41:42

解决方案3
1 2015-03-26 07:52:56

使用dplyr mutate在组中查找第一次出现的值

问题描述

3 个解决方案

解决方案1 7 2015-03-26 09:25:28

解决方案2 3 已采纳 2015-03-26 07:41:42

解决方案3 1 2015-03-26 07:52:56

解决方案1
7 2015-03-26 09:25:28

解决方案2
3 已采纳 2015-03-26 07:41:42

解决方案3
1 2015-03-26 07:52:56