I have a status_df
with id and status at each stage:
id | stage | status |
---|---|---|
15 | 1 | Pending |
15 | 2 | Not Sent |
16 | 1 | Approved |
16 | 2 | Rejected |
16 | 3 | Not Sent |
16 | 4 | Not Sent |
20 | 1 | Approved |
20 | 2 | Approved |
20 | 3 | Approved |
I am trying to do a group_by ID and apply the following logic:
any
stage for an ID has 'Pending' status, final_status column is 'Pending'any
stage for an ID has 'Rejected' status, final_status column is 'Rejected'all
stages for an ID are approved, final_status column is 'Approved'I am trying this (not working):
final_status_df = status_df %>% select(id, status) %>% group_by(id) %>%
mutate(final_status = case_when(any(status)=="Pending" ~ "Pending",
any(status)=="Rejected" ~ "Rejected",
all(status)=="Approved" ~ "Approved"))
Expected output (final_status_df):
id | final_status |
---|---|
15 | Pending |
16 | Rejected |
20 | Approved |
You were in the right direction with your attempt however, you closed any
/ all
brackets early before comparison ( ==
). Also since you only want 1 row for every id
you can use summarise
instead of mutate
which will also avoid the use of select
.
library(dplyr)
status_df %>%
group_by(id) %>%
summarise(final_status = case_when(any(status == "Pending") ~ "Pending",
any(status == "Rejected") ~ "Rejected",
all(status == "Approved") ~ "Approved"))
# id final_status
#* <int> <chr>
#1 15 Pending
#2 16 Rejected
#3 20 Approved
We can use summarise
instead of mutate
(as mutate
returns the output column with the same length
as the input column and it is used to create/modify a column instead of summarising).
Also, an easier option is to convert to factor
with levels
specified in the custom order, drop the unused levels ( droplevels
) and select the first
levels
after grouping by 'id'
library(dplyr)
status_df %>%
group_by(id) %>%
summarise(final_status = first(levels(droplevels(factor(status,
levels = c("Pending", "Rejected", "Approved"))))), .groups = 'drop')
-output
# A tibble: 3 x 2
# id final_status
# <int> <chr>
#1 15 Pending
#2 16 Rejected
#3 20 Approved
In the OP's code, any(status)
returns NA
, instead it should be wrapped on a logical vector ie any(status == "Pending")
. Also, as mentioned above, it would be summarise
instead of mutate
status_df <- structure(list(id = c(15L, 15L, 16L, 16L, 16L, 16L, 20L, 20L,
20L), stage = c(1L, 2L, 1L, 2L, 3L, 4L, 1L, 2L, 3L), status = c("Pending",
"Not Sent", "Approved", "Rejected", "Not Sent", "Not Sent", "Approved",
"Approved", "Approved")), class = "data.frame", row.names = c(NA,
-9L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.